Drilling into Big Data with Apache Drill by Steven J Vaughan-Nichols.
From the post:
Apache’s Drill goal is striving to do nothing less than answer queries from petabytes of data and trillions of records in less than a second.
You can’t claim that the Apache Drill programmers think small. Their design goal is for Drill to scale to 10,000 servers or more and to process petabyes of data and trillions of records in less than a second.
If this sounds impossible, or at least very improbable, consider that the NSA already seems to be doing exactly the same kind of thing. If they can do it, open-source software can do it.
In at interview at OSCon, the major open source convention in Portland, OR, Ted Dunning, the chief application architect for MapR, a big data company, and a Drill mentor and committer, explained the reason for the project. “There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data in such formats as Avro; Apache Hadoop data serialization system; JSON (JavaScript Object Notation); and Protocol Buffers Google’s data interchange format.”
As Dunning explained, big business wants fast access to big data and none of the traditional solutions, such as a relational database management system (RDBMS), MapReduce, or Hive, can deliver those speeds.
Dunning continued, “This need was identified by Google and addressed internally with a system called Dremel.” Dremel was the inspiration for Drill, which also is meant to complement such open-source big data systems as Apache Hadoop. The difference between Hadoop and Drill is that while Hadoop is designed to achieve very high throughput, it’s not designed to achieve the sub-second latency needed for interactive data analysis and exploration.
(…)
At this point, Drill is very much a work in progress. “It’s not quite production quality at this point, but by third or fourth quarter of 2013 it will become quite usable.” Specifically, Drill should be in beta by the third quarter.
So, if Drill sounds interesting to you, you can start contributing as soon as you get up to speed. To do that, there’s a weekly Google Hangout on Tuesdays at 9am Pacific time and a Twitter feed at @ApacheDrill. And, of course, there’s an Apache Drill Wiki and users’ and developers’ mailing lists.
NSA claims, actually any claims by any government officials, have to be judged by President Obama announcing yesterday: “There is No Spying on Americans.”
It has been creeping along for a long time but the age of Newspeak is here.
But leaving doubtful comments by members of the government to one side, Apache Drill does sound like an exciting project!