pathdoc - stock.adobe.com

Apache Drill improves big data SQL query engine

Open source SQL query engine gets new release that will expand the number of data sources that can be queried, including Elasticsearch, Splunk and Apache Cassandra.

The open source Apache Drill project's 1.19 release is now generally available.

The update, first introduced in June, brings improved performance and new data connector capabilities.

Apache Drill is a SQL query engine for NoSQL, as well as cloud storage and data lakes. The Apache Drill 1.19 release includes new connectors for Elasticsearch, Splunk and Apache Cassandra. Drill will now also more easily integrate with Apache Airflow, an increasingly popular workflow management platform.

Gartner analyst Merv Adrian noted that Drill remains a popular and active Apache project and is a part of HPE's Ezmeral Data Fabric. Meanwhile, the open source community's efforts are positive for enterprise data management, he said.

"Even as commercializers spread adoption to less-technical users with more user-friendly enterprise-facing offerings, the rich font of creativity continues to push the state of the art forward," Adrian said. "The impressive list of firms using Apache Drill speaks to the ongoing DIY mentality in aggressively competitive firms who continue to see open source data management software as a possible leg up."

How Apache Drill fits into the data landscape

At its core, Drill is a distributed, interactive SQL query engine that enables users to point it at data, then query using standard SQL.

"The learning curve for Drill is extremely low, and it works just as well from a single node on a laptop to a massive cluster," said Charles Givre, vice president of Apache Drill and the CEO and co-founder of enterprise data platform vendor DataDistillr. "While Drill is built for interactive queries, it is not built for large ETL [extract, transform and load] jobs and lacks some of the resiliency of Apache Spark."

Givre noted that it is easy to connect Drill to cloud data lakes such as Amazon Simple Storage Service and that Drill can also connect to Microsoft Azure and Google Cloud. He added that work is in progress to enable Drill to connect to other cloud data such as Dropbox, OneDrive and Oracle Cloud.

The impressive list of firms using Apache Drill speaks to the ongoing DIY mentality in aggressively competitive firms who continue to see open source data management software as a possible leg up.
Merv AdrianAnalyst, Gartner

New features in Apache Drill 1.19

Among the new features in the latest update are connectors for Elasticsearch, Splunk and Cassandra. Givre said these plugins are more sophisticated than connectors in previous releases. Specifically, he noted that the queries that get pushed down to the source system are considerably more optimized than in other storage plugins.

"Ultimately, this will lead to much better performance when querying these source systems," Givre said.

Another major contribution Givre highlighted is the XML format plugin, which now ships with Drill. Givre explained that users can now directly query XML files, including deeply nested ones, without defining a schema using standard SQL.

He added that the XML capability is also added to the REST plugin, meaning users can query APIs that return XML.

"The REST reader has been greatly improved, which means that it is relatively easy to query data behind REST APIs using Drill," Givre said.

He said he expects future Apache Drill releases will add more connectors for different data sources. Among the likely future connectors are ones for the Delta Lake project, which was created by Databricks and is now an open source project run by the Linux Foundation.

"I suspect that as more people use Drill, we will continue to see more integrations with popular analytic tools," Givre said.

Dig Deeper on Database management

Business Analytics
SearchAWS
Content Management
SearchOracle
SearchSAP
Close