Spark: Parse CSV file and group by column value by Mark Needham.
Mark parses a 1GB file that details 4 million crimes from the City of Chicago.
And he does it two ways: Using Unix and Spark.
Results? One way took more than 2 minutes, the other way, less than 10 seconds.
Place your bets with office staff and then visit Mark’s post for the results.