Archive for the ‘Datamash’ Category


Wednesday, August 6th, 2014

GNU datamash

From the homepage:

GNU datamash is a command-line program which performs basic numeric,textual and statistical operations on input textual data files.

To which you then reasonably ask: What basic numeric, textual and statistical operations?

From the manual:

File operations: transpose, reverse

Numeric operations: sum, min, max, absmin, absmax

Textual/Numeric operations: count, first, last, rand, unique, collapse, countunique

Statistical operations: mean, median, q1, q3, iqr, mode, antimode, pstdev, sstdev, pvar, svar, mad, madraw, sskew, pskew, skurt, pkurt, jarque, dpo

The default column separator is TAB but another character can be substituted for TAB.

Looks like a great utility to have in your data mining toolbox.

I first saw this in a tweet by Joe Pickrell.