Archive for the ‘Arrays’ Category

APL in R “The past isn’t dead. It isn’t even past.”*

Monday, March 14th, 2016

APL in R by Jan de Leeuw and Masanao Yajima.

From the introduction:

APL was introduced by Iverson (1962). It is an array language, with many functions to manipulate multidimensional arrays. R also has multidimensional arrays, but not as many functions to work with them.

In R there are no scalars, there are vectors of length one. For a vector x in R we have dim(x) equal to NULL and length(x) > 0. For an array, including a matrix, we have length(dim(x)) > 0. APL is an array language, which means everything is an array. For each array both the shape ⍴A and the rank ⍴⍴A are defined. Scalars are arrays with shape equal to one, vectors are arrays with rank equal to one.

If you want to evaluate APL expressions using a traditional APL virtual keyboard, we recommend the nice webpage at ngn.github.io/apl/web/index.html. EliStudio at fastarray.appspot.com/default.html is essentially an APL interpreter running in a Qt GUI, using ascii symbols and symbol-pairs to replace traditional APL symbols (Chen and Ching (2013)). Eli does not have nested arrays. It does have ecc, which compiles eli to C.

In 1994 one of us coded most APL array operations in XLISP-STAT. The code is still available at gifi.stat.ucla.edu/apl.

Certain this will be useful for R programmers but more generally curious if there is a genealogy of functions across programming languages?

Enjoy!

*Apologies to William Faulkner.

A Python Compiler for Big Data

Tuesday, December 18th, 2012

A Python Compiler for Big Data by Stephen Diehl.

From the post:

Blaze is the next generation of NumPy, Python’s extremely popular array library. At Continuum Analytics we aim to tackle some of the hardest problems in large data analytics with our Python stack of Numba and Blaze, which together will form the basis of distributed computation and storage system which is simultaneously able to generate optimized machine code specialized to the data being operated on.

Blaze aims to extend the structural properties of NumPy arrays to to a wider variety of table and array-like structures that support commonly requested features such missing values, type heterogeneity, and labeled arrays.

(images omitted)

Unlike NumPy, Blaze is designed to handle out-of-core computations on large datasets that exceed the system memory capacity, as well as on distributed and streaming data. Blaze is able to operate on datasets transparently as if they behaved like in-memory NumPy arrays.

We aim to allow analysts and scientists to productively write robust and efficient code, without getting bogged down in the details of how to distribute computation, or worse, how to transport and convert data between databases, formats, proprietary data warehouses, and other silos.

Just a thumbnail sketch but enough to get you interested in learning more.

SciDB – Numeric Array Database (NAD)

Saturday, September 25th, 2010

SciDB announced its first source-code release Open Letter to the SciDB Community on 24 September 2010.

In Overview of SciDB, Large Scale Array Storage, Processing and Analysis, the SciDB team says scientific data differs from business data because:

  1. scientific analysis typically requires mathematically and algorithmically sophisticated data processing methods
  2. data generated by modern scientific instruments is extremely large

I don’t find those convincing.

The article also claims: “…scientific data has a necessary and implicit ordering; for each element or data value there are other values left, right, up, down, next, previous, or adjacent to it.”

The content of such arrays is always numeric data and you can talk about numeric array databases.

I find the overall approach refreshing because it isn’t aiming for a general solution to all data issues.

Instead, a solution for numeric data in an array.

Now if we can just get past the search for a general semantic solution.