Brief introduction to Scala and Breeze for statistical computing by Darren Wilkinson.
From the post:
In the previous post I outlined why I think Scala is a good language for statistical computing and data science. In this post I want to give a quick taste of Scala and the Breeze numerical library to whet the appetite of the uninitiated. This post certainly won’t provide enough material to get started using Scala in anger – but I’ll try and provide a few pointers along the way. It also won’t be very interesting to anyone who knows Scala – I’m not introducing any of the very cool Scala stuff here – I think that some of the most powerful and interesting Scala language features can be a bit frightening for new users.
To reproduce the examples, you need to install Scala and Breeze. This isn’t very tricky, but I don’t want to get bogged down with a detailed walk-through here – I want to concentrate on the Scala language and Breeze library. You just need to install a recent version of Java, then Scala, and then Breeze. You might also want SBT and/or the ScalaIDE, though neither of these are necessary. Then you need to run the Scala REPL with the Breeze library in the classpath. There are several ways one can do this. The most obvious is to just run scala with the path to Breeze manually specified (or specified in an environment variable). Alternatively, you could run a console from an sbt session with a Breeze dependency (which is what I actually did for this post), or you could use a Scala Worksheet from inside a ScalaIDE project with a Breeze dependency.
It will help if you have an interest in or background with statistics as Darren introduces you to using Scala and the Breeze.
Breeze is described as:
Breeze is a library for numerical processing, machine learning, and natural language processing. Its primary focus is on being generic, clean, and powerful without sacrificing (much) efficiency. Breeze is the merger of the ScalaNLP and Scalala projects, because one of the original maintainers is unable to continue development.
so you are likely to encounter it in several different contexts.
I experience the move from “imperative” to “functional” programming being similar to moving from normalized to non-normalized data.
Normalized data, done by design prior to processing, makes some tasks easier for a CPU. Non-normalized data omits the normalization task (a burden on human operators) and puts that task on a CPU, if and when desired.
Decreasing the burden on people and increasing the burden on CPUs doesn’t trouble me.
You?