imMens: Real-time Visual Querying of Big Data by Zhicheng Liu, Biye Jiangz and Jeffrey Heer.
Abstract:
Data analysts must make sense of increasingly large data sets, sometimes with billions or more records. We present methods for interactive visualization of big data, following the principle that perceptual and interactive scalability should be limited by the chosen resolution of the visualized data, not the number of records. We first describe a design space of scalable visual summaries that use data reduction methods (such as binned aggregation or sampling) to visualize a variety of data types. We then contribute methods for interactive querying (e.g., brushing & linking) among binned plots through a combination of multivariate data tiles and parallel query processing. We implement our techniques in imMens, a browser-based visual analysis system that uses WebGL for data processing and rendering on the GPU. In benchmarks imMens sustains 50 frames-per-second brushing & linking among dozens of visualizations, with invariant performance on data sizes ranging from thousands to billions of records.
Code is available at: https://github.com/StanfordHCI/imMens
The emphasis on “real-time” with “big data” continues.
Impressive work but I wonder if there is a continuum of “big data” for “real-time” access, analysis and/or visualization?
Some types of big data are simple enough for real-time analysis, but other types are less so and there are types of big data where real-time analysis is inappropriate.
What I don’t know is what factors you would evaluate to place one big data set at one point on that continuum and another data set at another. Closer to one end or the other.
Research that you are aware of on the appropriateness of “real-time” analysis of big data?
I first saw this in This Week in Research by Isaac Lopez.