Managing Hadoop projects: What you need to know to succeed
A comprehensive collection of articles, videos and more, hand-picked by our editors
While Hadoop technology has spent several years in the IT spotlight, widespread adoption of the big data processing framework still hasn't caught on yet. This episode of BizApps Today looks at obstacles to Hadoop adoption and why SQL-on-Hadoop tools might be able to improve the situation.
Craig Stedman -- executive editor for SearchDataManagement and SearchBusinessAnalytics -- learned more about these issues while covering the 2015 Pacific Northwest BI Summit in Grants Pass, Ore. Quoting Merv Adrian, an analyst with Gartner who led a discussion session on Hadoop, Stedman told host and site editor Fran Sales that the technology "has matured enough … to move beyond early adopters like Internet companies and marketing service providers and push its way into more traditional enterprises."
During an interview at the summit, Adrian said Hadoop is transitioning toward the mainstream. "That's an interesting place in the market," he added. "People who up until now have said, 'I'm just going to wait until we know that stuff is ready,' could at any moment now say, 'Yup, it looks like it is.'"
However, a 2015 Gartner survey pointed to lingering pessimism about Hadoop, as 54% of respondents said their organizations have no plans to use it. The two biggest barriers to adoption cited by Adrian are a lack of workers with necessary Hadoop skills and the fact that many companies don't have a real business need for Hadoop technology -- at least not yet.
More users on Hadoop via SQL on Hadoop?
There is a potential bright spot that could turn the tide for Hadoop: Vendors are introducing SQL-based analytics software designed to work with the processing framework. These new products let analysts write queries against Hadoop data in SQL, the standard programming language for relational databases; that enables organizations looking to use Hadoop to tap into the SQL skills they already have in-house, Stedman said.
"But most of the SQL-on-Hadoop tools are very new and unproven, and another issue is that there are so many of them," he added. In fact, at the Pacific Northwest BI Summit, Adrian listed 14 competing SQL-on-Hadoop technologies. "There are a lot of options to consider, and some of those options might not pan out in the long run," Stedman said.
The Apache Spark processing engine, another emerging open source technology, is a possible wild card on Hadoop adoption. For now, Spark is used primarily to accelerate processing of data stored in the Hadoop Distributed File System, but it can also run in standalone mode -- making it a potential competitor to Hadoop in the long run. And it seems likely, Stedman said, that some vendors "will eventually want to break the ties and try to go it alone with Spark, at least for some applications."
Opening chapter of data storytelling
BizApps Today also looks into what may become a new role in companies: the data storyteller.
"We're talking about somebody who takes the results of analytics applications … and crafts a narrative to explain them to corporate and business executives so they can understand them and hopefully use them to make better decisions," Stedman said.
To fill this job, companies might hire a journalist or someone else professionally trained to tell stories, or look internally for a charismatic data scientist or other analyst who can successfully explain results to execs.
At the Pacific Northwest BI Summit, there was some talk about storytellers eventually becoming even more important than data analysts themselves. But Stedman predicted that data storytelling will go hand-in-hand with traditional data analytics duties. After all, good data is the starting point to create a compelling business story, he said. Watch the video to hear more about that and the prospects for broader adoption of Hadoop technology.