Combine results of column one Then sum column 2 to list total for each entry in column one

I am bit of Bash newbie, so please bear with me here.

I have a text file dumped by another software (that I have no control over) listing each user with number of times accessing certain resource that looks like this:

Jim 109
Bob 94
John 92
Sean 91
Mark 85
Richard 84
Jim  79
Bob  70
John 67
Sean 62
Mark 59
Richard 58
Jim  57
Bob  55
John 49
Sean 48
Mark 46
.
.
.

My goal here is to get an output like this.

Jim  [Total for Jim]
Bob  [Total for Bob]
John [Total for John]

And so on.

Names change each time I run the query in the software, so static search on each name and then piping through wc does not help.

Answers


This sounds like a job for awk :) Pipe the output of your program to the following awk script:

your_program | awk '{a[$1]+=$2}END{for(name in a)print name " " a[name]}'

Output:

Sean 201
Bob 219
Jim 245
Mark 190
Richard 142
John 208

The awk script itself can be explained better in this format:

# executed on each line
{
  # 'a' is an array. It will be initialized 
  # as an empty array by awk on it's first usage
  # '$1' contains the first column - the name
  # '$2' contains the second column - the amount
  #
  #  on every line the total score of 'name' 
  #  will be incremented  by 'amount'
  a[$1]+=$2
}
# executed at the end of input
END{
  # print every name and its score
  for(name in a)print name " " a[name]
}

Note, to get the output sorted by score, you can add another pipe to sort -r -k2. -r -k2 sorts the by the second column in reverse order:

your_program | awk '{a[$1]+=$2}END{for(n in a)print n" "a[n]}' | sort -r -k2

Output:

Jim 245
Bob 219
John 208
Sean 201
Mark 190
Richard 142

Need Your Help