how can I split up this string

I am currently trying to sanitize some log files so they are in an easier format to read, and have been trying to use the gnu cut command, which works fairly well, although I cannot really think of a good way to remove the [INFO] part of the string

logs/logs/server_1283258036.log:2010-08-31 23:06:51 [INFO] <NateMar> where?!
logs/logs/server_1281904775.log:2010-08-15 22:59:53 [INFO] <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh
logs/logs/server_1282136782.log:2010-08-18 16:27:32 [INFO] <pinguin> <pinguin>§F :/
logs/logs/server_1282136782.log:2010-08-18 16:27:37 [INFO] <TotempaaltJ> <TotempaaltJ>§F That helped A LOT
logs/logs/server_1282136782.log:2010-08-18 16:27:37 [INFO] <Rizual> §b<Rizual>§F hm?
logs/logs/server_1282136782.log:2010-08-18 16:29:10 [INFO] <pinguin> <pinguin>§F bah
logs/logs/server_1282136782.log:2010-08-18 16:29:35 [INFO] <TotempaaltJ> <TotempaaltJ>§F Finished my houses 
logs/logs/server_1282136782.log:2010-08-18 16:29:40 [INFO] <TotempaaltJ> <TotempaaltJ>§F or whatever
logs/logs/server_1282136782.log:2010-08-18 16:30:47 [INFO] <Rizual> §b<Rizual>§So much iron
logs/logs/server_1282136782.log:2010-08-18 16:30:58 [INFO] <TotempaaltJ> <TotempaaltJ>§F Ah yes, furnaces don't work.o
logs/logs/server_1282136782.log:2010-08-18 16:31:01 [INFO] <Rizual> §b<Rizual>§F They do
logs/logs/server_1282136782.log:2010-08-18 16:31:06 [INFO] <TotempaaltJ> <TotempaaltJ>§F Hm
logs/logs/server_1282136782.log:2010-08-18 16:31:08 [INFO] <Rizual> §b<Rizual>§F just need to use /lighter
logs/logs/server_1282136782.log:2010-08-18 16:31:12 [INFO] <Valrix> <Valrix>§FNotch fixed them?

I would ultimately want to get the strings down to something that resembles the following (keep in mind that the logs are in two formats, the older format which has 2 copies of the names, as can be seen in the bulk of the logs above, and also the newer format, which only has the name in there once (can be seen in the first log line, the <natemar> one))

2010-08-31 23:06:51 <NateMar> where?!    
2010-08-15 22:59:53 <BoonTheMoon> ohhhhhh (this one would require both the same editing as above, plus removal of the "extra" name §b<BoonTheMoon>§)    

How should I go about doing this? Have thought about using awk, although I'm having a difficult time getting a grip on how that would work, so not sure how to set up something to do that. Any help would be greatly appreciated, thanks!

Answers


More takes on this, in sed, awk and bash:

[ghoti@pc ~]$ cat text
logs/logs/server_1283258036.log:2010-08-31 23:06:51 [INFO] <NateMar> where?!
logs/logs/server_1281904775.log:2010-08-15 22:59:53 [INFO] <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh

[ghoti@pc ~]$ sed 's/^[^:]*://;s/[[][^]]*[]] //' text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh

[ghoti@pc ~]$ awk '{sub(/^[^:]+:/,""); $3=""} 1' text
2010-08-31 23:06:51  <NateMar> where?!
2010-08-15 22:59:53  <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh

[ghoti@pc ~]$ while read line; do line=${line#*:}; echo "${line/\[*\] }"; done < text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh

While these are simple, they may be imperfect for the sake of shortness. For example, the awk script, by eliminating the third "word", leaves spaces that delimit the now-null word.

Note that as "elegant" as one-liners may seem for quick jobs, it's usually a better idea to be explicit with your code, especially when you have to deal with unknown input data or if you won't be inspecting your results immediately after you run things.

This is harder to read, but could be much safer, depending on your input:

[ghoti@pc ~]$ awk '$3~/^[[].+[]]$/{$3="";sub(/  /," ")} {sub(/^[^:]+:/,"")} 1' text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> çb<BoonTheMoon>çohhhhhh

For the bash script, you'd be safer to use a character class rather than a glob:

[ghoti@pc ~]$ shopt -s extglob
[ghoti@pc ~]$ while read line; do line=${line#*:}; echo "${line/\[+([[:upper:]])\] /}"; done < text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> çb<BoonTheMoon>çohhhhhh

Note that the extglob shopt option lets you use more advanced pattern matching inside the parameter replacement pattern. man bash and look for Pathname Expansion for details.

UPDATE:

You've added a new requirement to your question that wasn't there originally. Here's how you can achieve your new requirement with awk:

awk '$3~/^[[].+[]]$/{$3="";sub(/  /," ")} {sub(/^[^:]+:/,"")} $3~/^<.+>$/{sub(/^(§b)?<[[:alpha:]]+>§/,"",$4)} 1' text

This simply removes coloured nicknames from the 4th string, if the 3rd string looks like a bracketed nickname. This works for the sample you posted, but only you can determine whether this will work for you.

And with bash:

shopt -s extglob
while read date time tag nick line; do
  printf "%s %s %s %s\n" "${date#*:}" "$time" "$nick" "${line/#*([^< ])$nick??}"
done < text

You're on the right track using the cut command. The key to removing the [INFO] field is to exclude it from the final output. The -f1,2,4- argument does just that by including all fields except the 3rd which is just [INFO] at that point.

cut -d: -f2- Input.txt | cut -d' ' -f1,2,4- > Output.txt    

Need Your Help

Access from internet the web.xml file of an applicaiton

security java-ee servlets web.xml

Is it possible for someone to access or view the web.xml file of a web application over internet, using somthing like wget tool? I'm asking for saecurity reasons like username

Inno-setup install wont allow one exe to run when another is running already. How do I change this behavior?

inno-setup

We are installing out program via inno setup. We have one main exe file, that runs our actual program, but we install some other exes that are always running in the background.