The widening field of database architectures formally gained a new contestant last week as startup Splice Machine...
Inc. launched a SQL-on-Hadoop database that it said brings ACID-compliant transactions and SQL queries to bear on data stored in the Hadoop Distributed File System (HDFS).
The San Francisco-based company made Version 1.0 of its namesake database generally available, after a public beta-testing period that began last May. As part of the offering, Splice Machine deployed a modified implementation of open-source Apache Derby, a lightweight Java-based relational database, on top of HDFS and its companion HBase NoSQL database. Derby provides the SQL capabilities, while HBase sits on top of HDFS and supports the scaling of database tables across multiple servers in Hadoop clusters.
Unlike many new databases that aim at specialized data processing needs, Splice Machine's is intended to be a general-purpose platform that can handle a mix of transaction processing and analytics applications, according to Monte Zweben, the vendor's co-founder and CEO. Running the software on Hadoop offers economical scalability, he said, while the support for relational SQL and the ACID database properties -- atomicity, consistency, isolation and durability -- adds the transactional backbone that most businesses are familiar with from relational database management systems (RDBMS).
Looking for the Holy Grail in Hadoop
For Rob Fuller, a general-purpose database does mean a hybrid database for processing both transactions and analytical queries. Fuller is managing director of product innovation at Harte Hanks, a former newspaper publishing and broadcast media company that now focuses on marketing services. Fuller has worked with beta versions of the Splice Machine software, and he sees potential value for incorporating the Hadoop RDBMS into Harte Hanks applications that typically field a range of marketing databases for each customer.
"Real-time transactional processing on Hadoop is a little bit of a Holy Grail," Fuller said -- and Harte Hanks was also looking for better analytics performance and scalability than it was getting from the Oracle Real Application Clusters (RAC) databases it had deployed previously. In proof-of-concept projects, Fuller said, the new software worked well on some problematic queries of tables with more than 800 million rows of data: Compared to an Oracle RAC database, Splice Machine increased query speeds by three to seven times. And Fuller anticipates that adding more servers to scale-out the Hadoop clusters running the database will gain even more performance.
Derby on comeback trail
The Splice Machine database is being aimed at an interesting space, according to analysts.
"Splice Machine would like people to see it as a data warehouse platform that also supports transactions," said IDC analyst Carl Olofson. He thinks the scalability afforded by the Hadoop database architecture could be a plus for the software. "It has potential as a scalable operational store," Olofson said, adding that Splice Machine's use of open source software and support for commodity clusters could also make it a cost-effective option.
Splice Machine's approach "may be the nearest equivalent to an Oracle [database] running over Hadoop," said Robin Bloor, chief analyst at The Bloor Group. Bloor called the product "the first pure-play general-purpose database on Hadoop." He also credited Splice Machine for work the company did to optimize the underlying Derby database for distributed uses.
Splice Machine's database represents something of a comeback for the Derby system, which was conceived by Cloudscape Inc. in the 1990s. That company was later bought by Informix Software Inc., which in turn was purchased by IBM in 2001. In 2004, IBM ceded Derby to the Apache Software Foundation as open source technology.
Splice Machine is available in two editions: a free "startup edition" for companies that are less than five years old and have under $10 million in annual revenue, and an enterprise edition that includes a free development and test node license and a paid license for staging and production uses. Pricing for that license starts at $5,000 per node annually and includes full support.
Learn some SQL-on-Hadoop basics
Find out more about recent SQL-on-Hadoop products
Look into the background of the SQL-on-Hadoop trend