Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 16, 2014

Spark: Parse CSV file and group by column value

Filed under: Linux OS,Spark — Patrick Durusau @ 7:24 pm

Spark: Parse CSV file and group by column value by Mark Needham.

Mark parses a 1GB file that details 4 million crimes from the City of Chicago.

And he does it two ways: Using Unix and Spark.

Results? One way took more than 2 minutes, the other way, less than 10 seconds.

Place your bets with office staff and then visit Mark’s post for the results.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress