Sergey Nivens - Fotolia
IBM is dropping its own Hadoop platform and adopting Hortonworks Inc.'s instead as part of a two-way deal that could give users of both companies increased access to enterprise-class capabilities for managing and analyzing big data.
IBM will end development of BigInsights, its distribution of Hadoop, and work to migrate existing users to the Hortonworks Data Platform (HDP). In return, Hortonworks will resell IBM's Data Science Experience suite of tools for collaborative analytics, as well as Big SQL, a SQL-on-Hadoop query engine developed by IBM. The two companies will also do joint development to expand the features of Apache Atlas, an open source data governance framework spearheaded by Hortonworks.
The deal should be "a shot in the arm" for the product offerings of both vendors, Ovum analyst Tony Baer said at DataWorks Summit 2017 in San Jose, Calif., where the deal was announced.
IBM "gets to say they have a Hadoop strategy again," after facing widespread doubts about the future of BigInsights over the past two years, Baer said. And Hortonworks can now offer its users a data science platform to help them coordinate work on advanced analytics applications, matching technologies previously released by Hadoop rival Cloudera Inc. and several analytics vendors, in addition to IBM.
During the conference's opening keynote session, Rob Thomas, general manager of IBM Analytics, said leaning on HDP as the underlying Hadoop platform will let IBM focus more on developing its data science and machine learning technologies. That functionality will then be made available to both IBM and Hortonworks users through the Data Science Experience, which IBM shortens to DSX.
Machine learning comes to the fore
The heightened focus on products like DSX and Cloudera's rival Data Science Workbench aligns with the growing use of machine learning and other forms of artificial intelligence (AI) cited by various attendees at the conference, which was organized by Hortonworks and its former parent company, Yahoo.
For example, Duke Energy Corp. is looking to use machine learning and AI tools to identify potential equipment problems by analyzing sensor data from its transmission network, said John Pressley, director of information architecture at the electric utility and natural gas distributor in Charlotte, N.C.
Machine learning fueled by data science techniques could also enable more proactive and personalized customer service, Pressley said during a user panel discussion.
"Where we want to take it next is more streaming and real-time information so there's nothing that stops us from answering a customer's call and understanding what their sentiment is on that phone call," he said. "And then we can be changing [promotional] offers as we get to know what that customer's most important thing is."
Meanwhile, the planned development work on Atlas signals a continued elevation of data governance and security as key issues in big data environments, according to Gartner analyst Merv Adrian.
"The huge majority of Hadoop adopters that are currently stalled in attempting to get to broad production use will need to deal with those issues when and if they break the logjam," Adrian said. IBM's involvement with Atlas could provide "real credibility and experience" to the still-emerging governance technology, he added.
But the move raises questions about an existing Hadoop platform partnership between Hortonworks and Microsoft. In addition, it continues the consolidation of the Hadoop market, which is now down to four vendors: Hortonworks and fellow big data specialists Cloudera and MapR Technologies, plus cloud-platform market leader Amazon Web Services.
An air of inevitability for IBM
IBM's decision to throw in its lot with Hortonworks on Hadoop follows a similar move by Pivotal Software in April 2016. At the time, industry analysts questioned how long IBM would hold out on offering its own Hadoop platform, especially in light of its teaming up with Hortonworks, Pivotal and other companies in 2015 on the ODPi effort to create common reference specifications for different Hadoop distributions.
The mothballing of BigInsights in favor of HDP "seems inevitable in hindsight," Adrian said. "IBM likely had as many developers as customers on BigInsights. And many of its users were essentially 'given' the offering as part of larger deals -- usage stories have been few and far between."
Merv AdrianGartner analyst
IBM declined to disclose the number of existing BigInsights users that would be affected by the deal.
Microsoft is also in the Hortonworks camp, having built its cloud-based Azure HDInsight managed service for big data users on top of HDP. A Microsoft executive who spoke during the opening keynote session at the DataWorks Summit didn't mention the Hortonworks-IBM deal or HDInsight; a spokesman for Microsoft later said the company had no comment on the new agreement.
IBM's embrace of HDP will increase the size of the Hortonworks customer base, said Jamie Engesser, vice president of product management at Hortonworks. That, in turn, "gives Microsoft a bigger target base for moving workloads to the cloud," Engesser said. "They can give IBM users good options to run existing workloads or new ones in the [Azure] cloud."
Both Adrian and Baer described the relationship between Microsoft and Hortonworks as successful for the two companies thus far. The deal with IBM likely won't result in many deployments of HDP on IBM's Bluemix cloud platform as an alternative to HDInsight, Adrian said, but choosing between IBM and Microsoft in competitive situations "will be an interesting challenge" for Hortonworks.
For Hortonworks' own users, DSX offers a more enterprise-ready workbench environment for data scientists than the Apache Zeppelin notebook tool that the company has been offering, Engesser said. The same goes for Big SQL vs. Apache Hive, the SQL query engine that Hortonworks has backed to date.
Engesser said Hortonworks will continue to focus most of its own development resources on HDP; related open source technologies that can help users manage the core Hadoop platform, including Atlas and the Apache Ranger security framework; and Hortonworks DataFlow (HDF), a separate tool for managing the movement of data between systems. An HDF 3.0 update was released last week with new features for creating streaming analytics applications and a centralized repository for data schemas.
Consultant David Loshin compares the major Hadoop distributions
Get more insight and advice in our guide to managing big data platforms
Hadoop vendors take steps to simplify big data deployments in the cloud