Archive for the ‘Arrays’ Category

A Python Compiler for Big Data

Tuesday, December 18th, 2012

A Python Compiler for Big Data by Stephen Diehl.

From the post:

Blaze is the next generation of NumPy, Python’s extremely popular array library. At Continuum Analytics we aim to tackle some of the hardest problems in large data analytics with our Python stack of Numba and Blaze, which together will form the basis of distributed computation and storage system which is simultaneously able to generate optimized machine code specialized to the data being operated on.

Blaze aims to extend the structural properties of NumPy arrays to to a wider variety of table and array-like structures that support commonly requested features such missing values, type heterogeneity, and labeled arrays.

(images omitted)

Unlike NumPy, Blaze is designed to handle out-of-core computations on large datasets that exceed the system memory capacity, as well as on distributed and streaming data. Blaze is able to operate on datasets transparently as if they behaved like in-memory NumPy arrays.

We aim to allow analysts and scientists to productively write robust and efficient code, without getting bogged down in the details of how to distribute computation, or worse, how to transport and convert data between databases, formats, proprietary data warehouses, and other silos.

Just a thumbnail sketch but enough to get you interested in learning more.

SciDB – Numeric Array Database (NAD)

Saturday, September 25th, 2010

SciDB announced its first source-code release Open Letter to the SciDB Community on 24 September 2010.

In Overview of SciDB, Large Scale Array Storage, Processing and Analysis, the SciDB team says scientific data differs from business data because:

  1. scientific analysis typically requires mathematically and algorithmically sophisticated data processing methods
  2. data generated by modern scientific instruments is extremely large

I don’t find those convincing.

The article also claims: “…scientific data has a necessary and implicit ordering; for each element or data value there are other values left, right, up, down, next, previous, or adjacent to it.”

The content of such arrays is always numeric data and you can talk about numeric array databases.

I find the overall approach refreshing because it isn’t aiming for a general solution to all data issues.

Instead, a solution for numeric data in an array.

Now if we can just get past the search for a general semantic solution.