Archive for the ‘Vagrant’ Category

The Data Science Toolkit is now on Vagrant!

Tuesday, January 29th, 2013

The Data Science Toolkit is now on Vagrant! by Pete Warden.

From the post:

I have fallen in love with Vagrant over the last year, it turns an entire logical computer as a single unit of software. In simple terms, you can easily set up, run, and maintain a virtual machine image with all the frameworks and data dependencies pre-installed. You can wipe it, copy it to a different system, branch it to run experimental changes, keep multiple versions around, easily share it with other people, and quickly deploy multiple copies when you need to scale up. It’s as revolutionary as the introduction of distributed source control systems, you’re suddenly free to innovate because mistakes can be painlessly rolled back, and you can collaborate other people without worrying that anything will be overwritten.

Before I discovered Vagrant, I’d attempted to do something similar with my Data Science Toolkit package, distributing a VMware image of a full linux system with all the software and data it required pre-installed. It was a large download, and a lot of people used it, but the setup took more work than I liked. Vagrant solved a lot of the usability problems around downloading VMs, so I’ve been eager to create a compatible version of the DSTK image. I finally had a chance to get that working over the weekend, so you can create your own local geocoding server just by running:

vagrant box add dstk http://static.datasciencetoolkit.org/dstk_0.41.box

vagrant init

The box itself is almost 5GB with all the address data, so the download may take a while. Once it’s done go to http://localhost:8080 and you’ll see the web interface to the geocoding and unstructured data parsing functions.

Based on Oracle’s VirtualBox, this looks like a very cool way to distribute topic map applications with data.

Remember the Emulate Drug Dealers [Marketing Topic Maps] post?

I was very serious.