A new data processing workflow for R: dplyr, magrittr, tidyr, ggplot2
From the post:
Over the last year I have changed my data processing and manipulation workflow in R dramatically. Thanks to some great new packages like
dplyr
,tidyr
andmagrittr
(as well as the less-newggplot2
) I've been able to streamline code and speed up processing. Up until 2014, I had used essentially the same R workflow (aggregate
,merge
,apply
/tapply
,reshape
etc) for more than 10 years. I have added a few improvements over the years in the form of functions in packagesdoBy
,reshape2
andplyr
and I also flirted with the packagedata.table
(which I found to be much faster for big datasets but the syntax made it difficult to work with) — but the basic flow has remained remarkably similar. Until now…Given how much I've enjoyed the speed and clarity of the new workflow, I thought I would share a quick demonstration.
In this example, I am going to grab data from a sample SQL database provided by Google via Google BigQuery and then give examples of manipulation using
dplyr
,magrittr
andtidyr
(andggplot2
for visualization).…
This is a great introduction to a work flow in R that you can generalize for your own purposes.
Word counts won’t impress your English professor but you will have a base for deeper analysis of Shakespeare.
I first saw this in a tweet by Christophe Lalanne.