Strata + Hadoop World 2016: Hadoop and Spark in spotlight

Last updated:March 2016

Editor's note

Hadoop and the Spark data processing engine now share the spotlight at Strata + Hadoop World, which focuses on big data management and analytics technologies. And the Strata + Hadoop World 2016 conferences held in the U.S., one in the spring and one in the fall, took place at a noteworthy time in the evolution of both Hadoop and Spark.

Hadoop turns 10 this year, at least by some measures. For example, The Apache Software Foundation created a separate open source subproject for managing Hadoop development in January 2006, and the first public release of Hadoop code followed that April. Now the distributed processing framework is at something of a crossroads, as its original core components get augmented -- and perhaps supplanted -- by other technologies. In particular, MapReduce, initially Hadoop's all-in-one cluster resource manager and programming and processing environment, is being shunted aside by a combination of Spark, SQL-on-Hadoop query engines and the YARN resource management platform underpinning Hadoop 2. Users also have alternatives to the Hadoop Distributed File System (HDFS) for storing data -- for example, the Kudu columnar data store that Hadoop distribution market leader Cloudera Inc. introduced at the 2015 Strata conference in New York for use in real-time analytics applications involving streaming data.

Meanwhile, a 2.0 version of the Apache Spark open source software was released in July 2016 with updates to its stream processing, machine learning and Spark SQL modules, plus a promised performance boost. Hadoop and Spark are often paired together in deployments, with the latter being used to accelerate the processing of data stored in the former. But Spark can also run on its own against data in other platforms, such as NoSQL database systems or cloud-based Amazon Simple Storage Service implementations. Spark proponents and some industry analysts foresee a possible future in which today's Hadoop ecosystem is joined by one surrounding Spark.

New Spark and Hadoop developments, and the relationship between the two technologies, were prominent discussion topics at Strata + Hadoop World 2016 in both San Jose, Calif., in March and New York in September. Numerous presentations on big data trends and best practices for managing deployments were also in the spotlight at the conferences, which were jointly organized by Cloudera and O'Reilly Media Inc. In the sections below, you'll find our coverage of the two events, plus stories from last year's Strata conferences and other content on Hadoop, Spark and related technologies for managing and analyzing pools of big data.

1New developments in Hadoop, Spark and related technologies

Big data platforms are evolving rapidly, with some open source technologies getting four or more releases annually. Even Hadoop, now 10 years on from its initial development, is far from settled as a technology. This section includes stories on recent news, trends and user deployments involving Hadoop and Spark, as well as other big data technologies.