Data professionals take their benchmarks as most people take their scrambled eggs -- with a few grains of salt. Still, it is easy to see why Hadoop distro provider Cloudera has been publicizing recent performance benchmarks intended to show that its Impala MPP query engine is quickly maturing. After all, it faces stiff competition from existing SQL engines.
Impala is Cloudera's chief means to bring SQL analytics to the Hadoop platform. It has evolved quickly, like a lot of other Apache Hadoop data management elements. Some question how soon such Hadoop pieces can grow to compete with established SQL-based analytical alternatives. Enter the benchmark.
On identical hardware, Cloudera Inc. ran a series of 20 queries (based on the industry-standard benchmark TPC-DS ) to evaluate Impala's performance against an unnamed analytical database from a major vendor -- nondisclosure agreements bar Cloudera from stating that major vendor's name.
Cloudera said Impala, on average, ran two times faster than the established alternative, dubbed DBMS-Y, outperforming it in 17 of 20 queries.
The company's test "demonstrates scalability and the ability to support real interactive, multiuser workloads," according to Marcel Kornacker, architect of Cloudera's Impala project.
Underlying Impala's speed is an architecture that reduces data transforms and data movement.
"Impala runs natively in Hadoop and runs against all data with no need to transfer data out of Hadoop into storage engine," Kornacker said.
For Impala, the goal is to cover all SQL functionality used by analytical databases such as HP Vertica, IBM Netezza and others. Doubling the performance of systems like that, while impressive, is not such a boost that users will dig up existing infrastructure and platforms. But it may point to a capacity to grow that will interest shops that have begun to explore Hadoop systems.
NuoDB sees SQL on the rebound
After a few years of surging NoSQL technology, SQL itself is seeing resurgence in some new settings. Hadoop players are feverishly adding SQL traits to their data management offerings, and a few database systems have stepped forward as "NewSQL" alternatives built from the ground up to combine elements of NoSQL and SQL.
Among NewSQL players such as VoltDB, TransLattice Inc., FoundationDB and others is NuoDB Inc. The company recently released an update (version 2.0.2) of its NuoDB asynchronous peer-to-peer database, which the company said improves network performance for geodistributed operations, and streamlines some SQL functions. Both geodistribution and SQL support are areas of emphasis for NuoDB effort.
"Even in the NoSQL world, people are adding SQL-like capabilities," said Barry Morris, founder and CEO of NuoDB. As well, the arrival of cloud computing and geodistributed data sets has had a big effect on how operational databases are designed going forward, he added.
Forrester analyst says cognitive computing is good match for big data
With predictive analytics and other percolating technologies, it sometimes seems as if the names change but the game is the same. This point has been raised concerning cognitive computing, IBM's term for its Watson technology, which combines software and hardware to process human language and learn as it goes.
That sounds a lot like expert systems and artificial intelligence systems of the distant past. But some things are new here, according to Mike Gualtieri, principal analyst at Forrester Research Inc. In the AI days, "the focus was always on performance. You didn't have the processing power," he said. Moreover, "you didn't have a lot of data."
"Fast-forward," he said. "Now we have enormous computing power. And now -- thanks in great part to mobile applications specifically and a proliferation of applications generally -- we have a whole lot more data."
Machine-learning algorithms, which underlie predictive analytics, data mining and emerging cognitive computing, may be the only way to plow through the mountains of big data now accumulating, Gualtieri suggested.