parse csv file using gawk

How do you parse a csv file using gawk? Simply setting FS="," is not enough, as a quoted field with a comma inside will be treated as multiple fields.

Example using FS="," which does not work:

file contents:

one,two,"three, four",five
"six, seven",eight,"nine"

gawk script:

BEGIN { FS="," }
{
  for (i=1; i<=NF; i++) printf "field #%d: %s\n", i, $(i)
  printf "---------------------------\n"
}

bad output:

field #1: one
field #2: two
field #3: "three
field #4:  four"
field #5: five
---------------------------
field #1: "six
field #2:  seven"
field #3: eight
field #4: "nine"
---------------------------

desired output:

field #1: one
field #2: two
field #3: "three, four"
field #4: five
---------------------------
field #1: "six, seven"
field #2: eight
field #3: "nine"
---------------------------

Answers


The short answer is "I wouldn't use gawk to parse CSV if the CSV contains awkward data", where 'awkward' means things like commas in the CSV field data.

The next question is "What other processing are you going to be doing", since that will influence what alternatives you use.

I'd probably use Perl and the Text::CSV or Text::CSV_XS modules to read and process the data. Remember, Perl was originally written in part as an awk and sed killer - hence the a2p and s2p programs still distributed with Perl which convert awk and sed scripts (respectively) into Perl.


Need Your Help

Mixed Content error when using Nivo Lightbox over https

javascript jquery https youtube nivo-slider

I have a website which is accessed via https (and all resources are loaded via https too), where I have a Nivo Lightbox which should show a Youtube video (as an overlay over the website). The JavaS...

Logging Requests With Timing

web-analytics

What is the best way to log http requests for a web application, including ajax requests, so that I can later go back and query "I want to know how many times this request was made, and how long it...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.