A Python Compiler for Big Data by Stephen Diehl.
From the post:
Blaze is the next generation of NumPy, Python’s extremely popular array library. At Continuum Analytics we aim to tackle some of the hardest problems in large data analytics with our Python stack of Numba and Blaze, which together will form the basis of distributed computation and storage system which is simultaneously able to generate optimized machine code specialized to the data being operated on.
Blaze aims to extend the structural properties of NumPy arrays to to a wider variety of table and array-like structures that support commonly requested features such missing values, type heterogeneity, and labeled arrays.
(images omitted)
Unlike NumPy, Blaze is designed to handle out-of-core computations on large datasets that exceed the system memory capacity, as well as on distributed and streaming data. Blaze is able to operate on datasets transparently as if they behaved like in-memory NumPy arrays.
We aim to allow analysts and scientists to productively write robust and efficient code, without getting bogged down in the details of how to distribute computation, or worse, how to transport and convert data between databases, formats, proprietary data warehouses, and other silos.
Just a thumbnail sketch but enough to get you interested in learning more.