Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 16, 2012

Unix: Counting the number of commas on a line

Filed under: Awk,CSV — Patrick Durusau @ 10:42 am

Unix: Counting the number of commas on a line by Mark Needham.

From the post:

A few weeks ago I was playing around with some data stored in a CSV file and wanted to do a simple check on the quality of the data by making sure that each line had the same number of fields.

Marks offers two solutions to the problem, but concedes that more may exist.

A good first round sanity check to run on data stored in a CSV file.

Other one-liners you find useful for data analysis?

2 Comments

  1. You can do this in Python, too, much less cryptically.

    import sys
    for line in open(sys.argv[1]):
    print len([x for x in line if x == “,”])

    Extending it so it either gives you the number of commas or tells you which line is inconsistent is 2-3 lines more in Python, but not so easy in tr/sed/awk. Those tools are IMHO obsolete now.

    Comment by larsga@garshol.priv.no — November 16, 2012 @ 1:07 pm

  2. Thanks!

    I’m not sure the first awk script is cryptic. 😉

    But as you say, a more recent tool provides more options.

    Comment by Patrick Durusau — November 16, 2012 @ 5:03 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress