Easy, Scalable Distributed Data Analysis
Anaconda is a distribution that combines the most popular Python packages for data analysis, statistics, and machine learning. It has several tools for a variety of types of cluster computations, including MapReduce batch jobs, interactive parallelism, and MPI.
All of the packages in Anaconda are built, tested, and supported by Continuum. Having a unified runtime for distributed data analysis makes it easier for the broader community to share code, examples, and best practices — without getting tangled in a mess of versions and dependencies.
Good way to avoid dependency issues!
On scaling, I am reminded of a developer who designed a Python application to require upgrading for “heavy” use. Much to their disappointment, Python scaled under “heavy” use with no need for an upgrade.
I saw this in Christophe Lalanne’s Bag of Tweets for July 2012.