Zooming Through Historical Data with Streaming Micro Queries by Alex Woodie.
From the post:
Stream processing engines, such as Storm and S4, are commonly used to analyze real-time data as it flows into an organization. But did you know you can use this technology to analyze historical data too? A company called ZoomData recently showed how.
In a recent YouTube presentation, Zoomdata Justin Langseth demonstrated his company’s technology, which combines open source stream processing engines like Apache with data connection and visualization libraries based on D3.js.
“We’re doing data analytics and visualization a little differently than it’s traditionally done,” Langseth says in the video. “Legacy BI tools will generate a big SQL statement, run it against Oracle or Teradata, then wait for two to 20 to 200 seconds before showing it to the user. We use a different approach based on the Storm stream processing engine.”
Once hooked up to a data source–such as Cloudera Impala or Amazon Redshift–data is then fed into the Zoomdata platform, which performs calculations against the data as it flows in, “kind of like continues event processing but geared more toward analytics,” Langseth says.
From the video description:
In this hands-on webcast you’ll learn how LivePerson and Zoomdata perform stream processing and visualization on mobile devices of structured site traffic and unstructured chat data in real-time for business decision making. Technologies include Kafka, Storm, and d3.js for visualization on mobile devices. Byron Ellis, Data Scientist for LivePerson will join Justin Langseth of Zoomdata to discuss and demonstrate the solution.
After watching the video, what do you think the concept of “micro queries?”
I ask because I don’t know of any technical reason why a “large” query could not stream out interim results and display those as more results were arriving.
Visualization isn’t usually done that way but that brings me to my next question: Assuming we have interim results visualized, how useful are interim results? Being actionable on interim results really depends on the domain.
I rather like Zoomdata’s emphasis on historical data and the video is impressive.
You can download a VM at Zoomdata.
If you can think of upsides/downsides to the interim results issue, please give a shout!