Sergey Nivens - Fotolia
Hadoop gets a lot of attention, but the big data framework's entry into the IT mainstream has been slow -- only 10% of respondents to a 2015 Gartner survey were using it in production applications. One of the gating factors is the complexity of programming in its companion environment, MapReduce. But emerging SQL-on-Hadoop query engines offer a potential gateway to broader Hadoop use.
Here are three key points to keep in mind about SQL-on-Hadoop platforms:
SQL programmers could be the cavalry coming to Hadoop's rescue. Thus far, Hadoop has largely been the province of programmers with advanced skills writing MapReduce programs in Java. There aren't enough of those programmers to go around, and it costs a lot to hire and keep them. Integrating SQL, the standard programming language for relational databases, with Hadoop opens it up for use by the armies of developers and data analysts that are steeped in SQL know-how and already encamped inside most organizations.
Batch jobs are no longer the only game in town. MapReduce only supports batch workloads that run on a predefined schedule. Some SQL-on-Hadoop platforms are also geared to batch processing, but others support interactive and ad hoc queries using mainstream business intelligence (BI) tools. That lets users do self-service BI and real-time analytics against data in Hadoop clusters.
There are a lot of options -- and a lot to think about. More than a dozen SQL-on-Hadoop tools -- some open source, others commercial products -- are available, and their ranks continue to grow. Most are still immature; some support subsets of SQL capabilities. And because they specialize in a spectrum of different applications, prospective users need to understand their optimal uses before choosing one.
Hadoop not designed for fast data analysis
How to select the right SQL-on-Hadoop engine for big data analytics
Vendors hope SQL will speed Hadoop's mainstream acceptance