Editor at Large
Published: 05 Aug 2016
Hadoop turned 10 this year by some historical measures: when it became an Apache subproject and was given a name, when the first Hadoop code was released, when the first users deployed that code. To mark the occasion(s), executives from big data vendors are giving the distributed processing framework gift-wrapped accolades for its impact on data management and analytics processes over the past decade.
Not surprisingly, the celebrants include some of the people who played central roles in getting Hadoop off the ground. Doug Cutting, co-creator of the technology and now chief architect at Hadoop distribution vendor Cloudera, said Hadoop architectures have enabled businesses to "become much more data-driven -- and not on the periphery of organizations, but in the center." Fellow co-creator Mike Cafarella, a computer science professor and CEO of analytics startup Lattice Data, chimed in to say that before Hadoop, companies were "leaving huge amounts of really interesting [analytics] work on the table" because of the processing limitations of relational databases.
There's more. "Almost any enterprise you find that cares about its data is somewhere on a journey with Hadoop," said Sean Suchter, whose web search technology team at Yahoo became the first production user of Hadoop in 2006; Suchter is now CEO of Hadoop performance management startup Pepperdata. Raymie Stata, chief architect for search and advertising systems at Yahoo 10 years ago and now head of big data cloud services provider Altiscale, lauded Hadoop for giving programmers and analysts "direct access to all of the data in the enterprise, bypassing the high priests of data who could slow everything down" in traditional data warehouse environments.
You could be forgiven for taking words of praise from its progenitors with a grain of salt. In this case, though, there's merit in the meritorious views toward Hadoop.
Hadoop can't be credited with starting the business world down the data-driven analytics path; data warehouses and business intelligence (BI) systems began finding their way into companies more than two decades ago. And self-service BI tools that put analytics power in the hands of business users emerged in the mid-2000s. But Hadoop architectures have taken things to a different level, opening up new types of data for analysis and making it more feasible -- technically and economically -- to collect, process and use all the information flowing into organizations.
Take Uber, for example. The ride-sharing company was in danger of stalling out on analytics until it deployed a Hadoop data lake last year along with the Spark processing engine and other technologies. "We had data sets that weren't available [for analysis] within the company before -- now they are," said Vinoth Chandar, a senior software engineer at Uber. The Hadoop environment has become "the source of truth for all analytics data," he added, noting that Uber is looking to "make every decision data-driven."
General Electric's GE Power Services unit is another organization that's using a Hadoop-based system architecture -- front-ended by self-service BI software -- to create a more data-driven culture. Chief enterprise architect Don Perigo said GE Power Services went from 120 workers using a conventional BI and reporting system four years ago to 22,000 users of the big data platform. Executives set a goal of 50% utilization in individual business units -- in some departments, Perigo said, the adoption rate is up to 98%.
The University of Texas MD Anderson Cancer Center envisions the same sort of thing happening there. Right now, a lot of its data "is just dark -- we can't get to it and we can't use it," said Bryan Lari, director of institutional analytics and informatics. "The goal is to get to where everyone, from executives to admins, are using data to drive decisions." The vehicle: a Hadoop cluster that began operation in March.
The 10-year milestones come at a time when the future of Hadoop as we've known it is in question. Spark is pushing aside the MapReduce engine in many Hadoop architectures, and possible data storage alternatives to the Hadoop Distributed File System -- the framework's other original core component -- are springing up.
Hadoop may morph into a different set of components, or it could slowly fade from the scene, its throne usurped by other big data tools that have grown up around it. But even if the latter happens, Hadoop will have accomplished far more than Cutting likely imagined it would when he famously named the technology after his son's stuffed elephant a decade ago. And the data-driven environments it has fostered will remain -- which is worthy of some congratulatory pats on the back.
Get real-world advice in our guide to managing Hadoop architectures
Read a Q&A with Doug Cutting on Hadoop's past, present and future
Hadoop data lakes and data warehouses coexist in hybrid environments