michelangelus - Fotolia

News Stay informed about the latest enterprise technology news and product updates.

Originators form group to boost Presto SQL query engine

The Presto engine arose as an alternative to Hive for big data queries. Now, the Presto Software Foundation has formed to promote the SQL query software's virtues.

The Presto SQL query engine is determined to break out from the crowded pack of open source analytics tools.

To that end, members of the original Facebook Presto development team have joined with others to form the Presto Software Foundation.

The new group's goal is to boost Presto's open source credentials, and ensure the software's quality and extensibility, while moving the Presto SQL query engine forward in mainstream organizations.

The software, which debuted in 2012, has found adherents at Airbnb, Uber, Walmart and other companies. Vendors supporting Presto include Arm Treasure Data, AWS, Google, Microsoft, Qubole, Starburst Data, Teradata, Varada and others.

The software has seen a steady climb. Presto ranks 41st in a relational database category comprising 136 systems, as measured in the latest edition of the DB-Engines popularity ranking.

Presto SQL query engine on rise

While it has seen an uptick in recent years, Presto still significantly trails Apache Hive, a predecessor Facebook framework for batch query processing. That sits at number 10 in the DB-Engines relational database ranking.

As an older tool, Apache Hive has a lead on the Presto SQL engine. Hortonworks and others have devoted energy to improving its performance for queries. Also, Apache Hive has been a key component in many widely used Hadoop software distributions. Other open source SQL engine entrants include Drill, Hawq, Impala and Spark.

The numbers of Presto users outside of Facebook have grown dramatically, according to Martin Traverso, a co-creator of the software, and a co-founder of the Presto Software Foundation, which was formally announced Jan. 30. He pointed to Lyft, Netflix, Twitter and others as exemplary Presto users.

"Now, among users there is a general dependency on the success of the project," Traverso said. "Going forward, we feel it is important that the project remain independent."

So, to assure useful technology roadmaps, transparency and clarity, Traverso continued, he and others decided to set up the not-for-profit organization. The foundation is considering Presto SQL engine roadmaps security models for cloud object storage, failover and recovery protocols and means to better support ETL workloads, he said.

Presto SQL query engine timeline
The Presto SQL query engine arose within social media powerhouse Facebook. A foundation has formed to further Presto.

Multiple back ends

Traverso said Presto has made much progress since its first days as an ad hoc query alternative to batch Hive at Facebook. Meanwhile, the designers' decision early on to support execution on multiple data back ends has proved important.

Presto provides a SQL abstraction layer that allows access to data wherever it may be.
Justin BorgmanCEO, Starburst Data

Justin Borgman, CEO at Starburst Data, said the broad federated back-end capabilities of Presto -- for example, the ability to join log data stored in S3 with customer data stored in MySQL -- are a significant driver of interest. Outside of Facebook, Starburst technical staff forms the largest group of committers to the Presto project to date, Borgman said.

In fact, Starburst principals spearheaded much of the effort to enterprise-harden Presto. They served first at Hadapt, an early SQL on Hadoop player that Teradata bought in 2014. At Teradata, the Hadapt crew turned its attention to improving Presto's performance. The work was carried on at Starburst, which was spun out of Teradata in late 2017.

"Presto provides a SQL abstraction layer that allows access to data wherever it may be," Borgman said. This is useful as data lake architectures, which started out as Hadoop-based, are now moving to encompass cloud object storage, he continued.

The PostgreSQL of big data? Presto

For his part, Borgman said having an organization like the foundation -- one independent of corporate interests -- is an important step in the spread of the Presto SQL query engine.

He said he hopes the software could find a reception akin to popular open source databases -- potentially becoming something like "the PostgreSQL of big data analytics."

As tools for SQL on Hadoop analytics began appearing a few years ago, a steady stream became a small torrent. Now, as SQL on Hadoop morphs into SQL on anything, a foundation that focuses on Presto could help it stand out in that stream.

Dig Deeper on Big data management

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

What role do you see for SQL engines as back-end data becomes more voluminous and varied?
Well Jack, 
I have a question for you and the world....
"Who's your Data....base? Ancestrial wisdom will continue to be transmitted
transacted for our inheritance.
Fortunately, humans are evolving and because we are using the mind's capability of awareness, this has opened further realms of knowledge that we require to translate in more physical form of simpler understanding.
Yet all this communicational energy we transact daily...
we seem to be placing a higher price ju$t to retrieve a slither of information previously harvested from everything...silently. It's cool though :)
Sooo back to your question Jack.
I see SQL engines as back end data becoming more like...
individually wrapped slices of bread.
Instead of packaging all these interconnected data into larger "bags of exposed sliced bread".... owning our direct wireless carriers (individual humans) might be the answer to zero latency.

I know I know
that was a mindful ...right

Joining the mission to "...having an organization like the foundation -- one independent of corporate interests -- is an important step..."
SQL engines are the next best thing after sliced bread