Archive for the ‘Presto’ Category

Airbnb open sources SQL tool built on Facebook’s Presto database

Friday, March 6th, 2015

Airbnb open sources SQL tool built on Facebook’s Presto database by Derrick Harris.

From the post:

Apartment-sharing startup Airbnb has open sourced a tool called Airpal that the company built to give more of its employees access to the data they need for their jobs. Airpal is built atop the Presto SQL engine that Facebook created in order to speed access to data stored in Hadoop.

Airbnb built Airpal about a year ago so that employees across divisions and roles could get fast access to data rather than having to wait for a data analyst or data scientist to run a query for them. According to product manager James Mayfield, it’s designed to make it easier for novices to write SQL queries by giving them access to a visual interface, previews of the data they’re accessing, and the ability to share and reuse queries.

It sounds a little like the types of tools we often hear about inside data-driven companies like Facebook, as well as the new SQL platform from a startup called Mode.

At this point, Mayfield said, “Over a third of all the people working at Airbnb have issued a query through Airpal.” He added, “The learning curve for SQL doesn’t have to be that high.”

From the GitHub page:

Airpal is a web-based, query execution tool which leverages Facebook’s PrestoDB to make authoring queries and retrieving results simple for users. Airpal provides the ability to find tables, see metadata, browse sample rows, write and edit queries, then submit queries all in a web interface. Once queries are running, users can track query progress and when finished, get the results back through the browser as a CSV (download it or share it with friends). The results of a query can be used to generate a new Hive table for subsequent analysis, and Airpal maintains a searchable history of all queries run within the tool.


  • Optional Access Control
  • Syntax highlighting
  • Results exported to a CSV for download or a Hive table
  • Query history for self and others
  • Saved queries
  • Table finder to search for appropriate tables
  • Table explorer to visualize schema of table and first 1000 rows


  • Java 7 or higher
  • MySQL database
  • Presto 0.77 or higher
  • S3 bucket (to store CSVs)
  • Gradle 2.2 or higher

I understand to some degree the need to make SQL “simpler” but fail to see how simpler controls translate into a solution. The controls may be obvious enough but if I don’t know the semantics of the column headers, the simplicity of the interface won’t be terribly helpful.

Or to put it another way, users seem to be assumed to know the semantics of the tables they encounter. True/False?

Facebook’s Presto 10X Hive Speed (mostly)

Friday, November 8th, 2013

Facebook open sources its SQL-on-Hadoop engine, and the web rejoices by Derrick Harris.

From the post:

Facebook has open sourced Presto, the interactive SQL-on-Hadoop engine the company first discussed in June. Presto is Facebook’s take on Cloudera’s Impala or Google’s Dremel, and it already has some big-name fans in Dropbox and Airbnb.

Technologically, Presto and other query engines of its ilk can be viewed as faster versions of Hive, the data warehouse framework for Hadoop that Facebook created several years ago. Facebook and many other Hadoop users still rely heavily on Hive for batch-processing jobs such as regular reporting, but there has been a demand for something letting users perform ad hoc, exploratory queries on Hadoop data similar to how they might do them using a massively parallel relational database.

Presto is 10 times faster than Hive for most queries, according to Facebook software engineer Martin Traverso in a blog post detailing today’s news.

I think my headline is the more effective one. 😉

You won’t know anything until you download Presto, read the documentation, etc.

Presto homepage.

The first job is to get your attention, then you have to get the information necessary to be informed.

From Derrick’s post, which points to other SQL-on-Hadoop options, interesting times are ahead!

Presto is Coming!

Sunday, June 9th, 2013

Facebook unveils Presto engine for querying 250 PB data warehouse by Jordan Novet.

From the post:

At a conference for developers at Facebook headquarters on Thursday, engineers working for the social networking giant revealed that it’s using a new homemade query engine called Presto to do fast interactive analysis on its already enormous 250-petabyte-and-growing data warehouse.

More than 850 Facebook employees use Presto every day, scanning 320 TB each day, engineer Martin Traverso said.

“Historically, our data scientists and analysts have relied on Hive for data analysis,” Traverso said. “The problem with Hive is it’s designed for batch processing. We have other tools that are faster than Hive, but they’re either too limited in functionality or too simple to operate against our huge data warehouse. Over the past few months, we’ve been working on Presto to basically fill this gap.”

Facebook created Hive several years ago to give Hadoop some data warehouse and SQL-like capabilities, but it is showing its age in terms of speed because it relies on MapReduce. Scanning over an entire dataset could take many minutes to hours, which isn’t ideal if you’re trying to ask and answer questions in a hurry.

With Presto, however, simple queries can run in a few hundred milliseconds, while more complex ones will run in a few minutes, Traverso said. It runs in memory and never writes to disk, Traverso said.

Traverso goes onto say that Facebook will opensource Presto this coming Fall.

See my prior post on a more technical description of Presto: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices.

Bear in mind that getting an answer from 250 PB of data quickly isn’t the same thing as getting a useful answer quickly.