R and Hadoop Data Analysis – RHadoop by Istvan Szegedi.
From the post:
R is a programming language and a software suite used for data analysis, statistical computing and data visualization. It is highly extensible and has object oriented features and strong graphical capabilities. At its heart R is an interpreted language and comes with a command line interpreter – available for Linux, Windows and Mac machines – but there are IDEs as well to support development like RStudio or JGR.
R and Hadoop can complement each other very well, they are a natural match in big data analytics and visualization. One of the most well-known R packages to support Hadoop functionalities is RHadoop that was developed by RevolutionAnalytics.
Nice introduction that walks you through installation and illustrates the use of RHadoop for analysis.
The ability to analyze “big data” is becoming commonplace.
The more that becomes a reality, the greater the burden on the user to critically evaluate the analysis that produced the “answers.”
Yes, repeatable analysis yielded answer X, but that just means applying the same assumptions to the same data gave the same result.
The same could be said about division by zero, although no one would write home about it.