Startup JethroData recently released Version 1.0 of its index-based SQL-on-Hadoop engine for big data analytic...
Like others, JethroData supports popular BI tools like Tableau and QlikView, connecting via JDBC or ODBC and fielding standard SQL queries. With its search-style indexing, JethroData is looking to address shortcomings it sees in connecting such tools to Hadoop data today.
"Data tables have become much bigger in the Hadoop ecosystem," said Eli Singer, CEO and co-founder of JethroData. "When BI tools are pointed at those tables, they run into some of the performance limitations of Hadoop."
He said queries that people expect to execute in a few seconds can take a minutes. "When data becomes really big, it is extremely expensive in terms of execution to do SQL-on-Hadoop," Singer continued.
To improve performance, the JethroData engine automatically creates indexes as multi-hierarchy compressed bitmaps stored in columns. A goal of this architecture is to avoid expensive random writes and locking. That approach stands to reason given that JethroData use-case targets include interactive ad-hoc queries, as well as reports and live dashboards.
Singer said the company has partnered with Qlik, Tableau and Hadoop distribution providers to interactively tap into data stored in Hadoop. But the software does not solely work with Hadoop. The JethroData engine also can access Amazon S3 data in the cloud.
Version 1.0 represents the software's emergence from an extended beta. Along the way to general availability, JethroData has added:
- Adaptive caching
- Query queuing for trimming some performance spikes
- Features that eliminate BI tool joins where they are unnecessarily detrimental to performance
JethroData is headquartered in New York City with offices in Tel Aviv, Israel. Why the name JethroData? The founders were struck by the Bible story of Jethro, Moses' father-in-law, who suggested that a harried Moses distribute his adjudication workload, one that had come to consume too much of his time as leader of the Hebrew people. There is a connection to the JethroData architecture, as the database follows Jethro's original admonition to distribute the workload.
Read about HP Vertica's take on SQL-on-Hadoop
Check out other entrants in the SQL-on-Hadoop hunt