Visualizing High-Dimensional Data…

Visualizing High-Dimensional Data in the Browser with SVD, t-SNE and Three.js by Nicolas Kruchten.

From the post:

Data visualization, by definition, involves making a two- or three-dimensional picture of data, so when the data being visualized inherently has many more dimensions than two or three, a big component of data visualization is dimensionality reduction. Dimensionality reduction is also often the first step in a big-data machine-learning pipeline, because most machine-learning algorithms suffer from the Curse of Dimensionality: more dimensions in the input means you need exponentially more training data to create a good model. Datacratic’s products operate on billions of data points (big data) in tens of thousands of dimensions (big problem), and in this post, we show off a proof of concept for interactively visualizing this kind of data in a browser, in 3D (of course, the images on the screen are two-dimensional but we use interactivity, motion and perspective to evoke a third dimension).

Both the post and the demo are very impressive!

For a compelling review, see Dimension Reduction: A Guided Tour by Christopher J.C. Burges.

Christopher captures my concern with dimensional reduction in the first sentence of the introduction:

Dimension reduction1 is the mapping of data to a lower dimensional space such that uninformative variance in the data is discarded, or such that a subspace in which the data lives is detected.

I understand the need for dimensional reduction and that it can produce useful results. But what is being missed in the “…uniformative variance in the data…” is unknown.

Not an argument against dimensional reduction but a caution to avoid quickly dismissing variation in data as “uninformative.”

Comments are closed.