freshidea - Fotolia
Pivotal Software Inc. has pulled the plug on its Hadoop distribution and will resell Hortonworks Inc.'s version -- a decision that further moves the Hadoop market back to where it began, with three independent vendors as the primary suppliers of the open source distributed processing framework.
In addition to Hortonworks and rivals Cloudera Inc. and MapR Technologies Inc., IBM and Amazon Web Services (AWS) continue to offer Hadoop distributions. Like Pivotal, though, IBM last year aligned itself with Hortonworks as part of the Open Data Platform initiative, a group now known simply as ODPi, which is working to create a common set of core specifications for Hadoop platforms.
Some Hadoop and big data technology analysts expect IBM to eventually follow Pivotal's lead. "The only real question, at this point, is how long IBM will continue to keep its own distribution alive," said independent consultant Thomas Dinsmore.
AWS, meanwhile, has hedged its bets by allowing users of Amazon Elastic MapReduce (EMR), its cloud-based Hadoop managed service, to use either its homegrown Hadoop distribution or MapR's. Tony Baer, an analyst at London-based Ovum, said he thinks AWS is primarily interested in selling the EMR service, no matter which of the two supported distributions customers choose.
With IBM gravitating toward Hortonworks via ODPi, Pivotal's withdrawal from the Hadoop market "just formalizes that we've really winnowed down to four players, or three and a half if you count EMR [as a split offering]," Baer said.
Vendor land rush on the Hadoop market
After Cloudera, Hortonworks and MapR emerged as Hadoop-oriented startups, several major IT vendors jumped into the market in search of big data business. In addition to IBM and AWS, Intel introduced a Hadoop distribution in February 2013. The same week, data storage vendor EMC announced a distribution, called Pivotal HD; it soon handed the Hadoop software over to Pivotal, a spin-off launched that April, and jointly owned by EMC and subsidiary VMware.
But the Hadoop business didn't develop for either Intel or Pivotal as they hoped it would. Intel dropped its distribution just a year later, bought an 18% stake in Cloudera and anointed that company's version as its preferred Hadoop platform. And then, early last year, Pivotal allied itself with Hortonworks, signing on as a charter ODPi member and making its other big data technologies -- including the Greenplum analytical database and SQL-based HAWQ query engine -- available on the Hortonworks Data Platform (HDP) distribution. In addition, Hortonworks began providing technical support for Pivotal HD users.
Mike Gualtierianalyst at Forrester
Pivotal officials said at the time that Pivotal HD would continue to be developed and sold. But Baer described the company's decision to give up on the distribution and standardize on HDP from Hortonworks as the "inevitable culmination" of the initial steps taken a year ago.
In January, Forrester Research included Pivotal among the vendors covered in a Forrester Wave report on Hadoop distributions, along with Cloudera, Hortonworks, IBM and MapR. (Amazon EMR and other cloud-only Hadoop services will be covered in a separate report.) Pivotal was the only one of the five that didn't get ranked as a Hadoop market leader.
"It was kind of apparent that maybe they hadn't been investing in the core Hadoop platform," said Forrester analyst Mike Gualtieri, who co-authored the Wave report. He added that Pivotal's strong suit is more in the other big data technologies it has developed on top of Hadoop -- particularly HAWQ, which the company now markets as Pivotal HDB after open-sourcing the query engine last year.
Pivotal: We aren't going away completely
As part of this week's announcement, Pivotal said it will still sell Pivotal HDB, Greenplum and its GemFire in-memory data grid software for big data uses, in addition to its Cloud Foundry platform for building and managing cloud-based applications. Hortonworks, meanwhile, will offer Pivotal HDB under its own name as an alternative to the Hive open source SQL-on-Hadoop query engine it already supports. Hortonworks said Hive is best suited to batch querying of large data sets, while the Pivotal software can provide faster processing on smaller ones.
In addition, Hortonworks and Pivotal plan to jointly offer an upgrade program for users of the latter's Hadoop distribution who want to switch to HDP. That likely won't amount to a lot of new customers for Hortonworks -- Pivotal never made significant headway in the Hadoop market, according to Gartner analyst Merv Adrian. Even so, he called the expanded deal between the companies "a potential win for Hortonworks, with little downside."
For Hadoop users overall, Gualtieri doesn't see the reduced number of Hadoop distributions available to choose between as a bad thing -- it's the opposite, from his standpoint.
"I think there's less stress in the decision," he said, noting that after IBM joined the ODPi group last year, he heard positive feedback from IBM users that it had teamed up with other Hadoop vendors to work toward common specifications for their distributions.
More Hadoop developments, on our Strata + Hadoop World coverage page
SQL-on-Hadoop software eases big data applications for enterprise users
Read case studies and expert advice on managing Hadoop deployments