2014 Data Science Salary Survey: Tools, Trends, What Pays (and What Doesn’t) for Data Professionals by John King and Roger Magoulas.
From the webpage:
For the second year, O’Reilly Media conducted an anonymous survey to expose the tools successful data analysts and engineers use, and how those tool choices might relate to their salary. We heard from over 800 respondents who work in and around the data space, and from a variety of industries across 53 countries and 41 U.S. states.
Findings from the survey include:
- Average number of tools and median income for all respondents
- Distribution of responses by a variety of factors, including age, location, industry, position, and cloud computing
- Detailed analysis of tool use, including tool clusters
- Correlation of tool usage and salary
Gain insight from these potentially career-changing findings—download this free report to learn the details, and plug your own variables into the regression model to find out where you fit into the data space.
The best take on this publication can be found in O’Reilly Data Scientist Salary and Tools Survey, November 2014 by David Smith where he notes:
The big surprise for me was the low ranking of NumPy and SciPy, two toolkits that are essential for doing statistical analysis with Python. In this survey and others, Python and R are often similarly ranked for data science applications, but this result suggests that Python is used about 90% for data science tasks other than statistical analysis and predictive analytics (my guess: mainly data munging). From these survey results, it seems that much of the “deep data science” is done by R.
My initial observation is that “more than 800 respondents” is too small of a data sample to draw any useful conclusions about tools used by data scientists. Especially when the #1 tool listed in that survey was Windows.
Why a majority of “data scientists” confuse an OS with data processing tools like SQL or Excel, both of which ranked higher than Python or R, is unknown but casts further doubt on the data sample.
My suggestion would be to have a primary tool or language (other than an OS) whether it is R or Python but to be familiar with the strengths of other approaches. Religious bigotry about approaches is a poor substitute for useful results.