Scala as a platform…

Scala as a platform for statistical computing and data science by Darren Wilkinson

From the post:

There has been a lot of discussion on-line recently about languages for data analysis, statistical computing, and data science more generally. I don’t really want to go into the detail of why I believe that all of the common choices are fundamentally and unfixably flawed – language wars are so unseemly. Instead I want to explain why I’ve been using the Scala programming language recently and why, despite being far from perfect, I personally consider it to be a good language to form a platform for efficient and scalable statistical computing. Obviously, language choice is to some extent a personal preference, implicitly taking into account subjective trade-offs between features different individuals consider to be important. So I’ll start by listing some language/library/ecosystem features that I think are important, and then explain why.

A feature wish list

It should:

  • be a general purpose language with a sizable user community and an array of general purpose libraries, including good GUI libraries, networking and web frameworks
  • be free, open-source and platform independent
  • be fast and efficient
  • have a good, well-designed library for scientific computing, including non-uniform random number generation and linear algebra
  • have a strong type system, and be statically typed with good compile-time type checking and type safety
  • have reasonable type inference
  • have a REPL for interactive use
  • have good tool support (including build tools, doc tools, testing tools, and an intelligent IDE)
  • have excellent support for functional programming, including support for immutability and immutable data structures and “monadic” design
  • allow imperative programming for those (rare) occasions where it makes sense
  • be designed with concurrency and parallelism in mind, having excellent language and library support for building really scalable concurrent and parallel applications

The not-very-surprising punch-line is that Scala ticks all of those boxes and that I don’t know of any other languages that do. But before expanding on the above, it is worth noting a couple of (perhaps surprising) omissions. For example:

  • have excellent data viz capability built-in
  • have vast numbers of statistical routines in the standard library

Darren reviews Scala on each of these points.

Although he still uses R and Python, Darren has hopes for future development of Scala into a full featured data mining platform.

Perhaps his checklist will contribute the requirements needed to make that one of the futures of Scala.

I first saw this in Christophe Lalanne’s A bag of tweets / December 2013.

Comments are closed.