Strata + Hadoop World 2016: Hadoop and Spark in spotlight
-
Article
Big data analytics get a boost with real-time streaming
Comcast Corp. and other companies are turning to real-time processing and analytics technologies like Apache Kudu to help find useful information in streams of big data. Read Now
-
Article
IT vendors aim to make it easier to move to Hadoop Cloud
As user continue to shift big data management and analytics apps to the cloud, vendors are working to ease the process -- and lower the price -- of migrating Hadoop operations. Read Now
-
Video
Hadoop co-creator looks back, and ahead
Doug Cutting chats with SearchDataManagement at Strata + Hadoop World about Hadoop's origins, and offers some advice developers can consider for the next decade. Watch Now
-
Article
Data streaming takes center stage at Strata Hadoop 2016
Hadoop-based applications are increasingly including Spark Streaming, Kafka and other components that enable real-time streaming analytics capabilities. Find out why. Read Now
Editor's note
Hadoop and the Spark data processing engine now share the spotlight at Strata + Hadoop World, which focuses on big data management and analytics technologies. And the Strata + Hadoop World 2016 conferences held in the U.S., one in the spring and one in the fall, took place at a noteworthy time in the evolution of both Hadoop and Spark.
Hadoop turns 10 this year, at least by some measures. For example, The Apache Software Foundation created a separate open source subproject for managing Hadoop development in January 2006, and the first public release of Hadoop code followed that April. Now the distributed processing framework is at something of a crossroads, as its original core components get augmented -- and perhaps supplanted -- by other technologies. In particular, MapReduce, initially Hadoop's all-in-one cluster resource manager and programming and processing environment, is being shunted aside by a combination of Spark, SQL-on-Hadoop query engines and the YARN resource management platform underpinning Hadoop 2. Users also have alternatives to the Hadoop Distributed File System (HDFS) for storing data -- for example, the Kudu columnar data store that Hadoop distribution market leader Cloudera Inc. introduced at the 2015 Strata conference in New York for use in real-time analytics applications involving streaming data.
Meanwhile, a 2.0 version of the Apache Spark open source software was released in July 2016 with updates to its stream processing, machine learning and Spark SQL modules, plus a promised performance boost. Hadoop and Spark are often paired together in deployments, with the latter being used to accelerate the processing of data stored in the former. But Spark can also run on its own against data in other platforms, such as NoSQL database systems or cloud-based Amazon Simple Storage Service implementations. Spark proponents and some industry analysts foresee a possible future in which today's Hadoop ecosystem is joined by one surrounding Spark.
New Spark and Hadoop developments, and the relationship between the two technologies, were prominent discussion topics at Strata + Hadoop World 2016 in both San Jose, Calif., in March and New York in September. Numerous presentations on big data trends and best practices for managing deployments were also in the spotlight at the conferences, which were jointly organized by Cloudera and O'Reilly Media Inc. In the sections below, you'll find our coverage of the two events, plus stories from last year's Strata conferences and other content on Hadoop, Spark and related technologies for managing and analyzing pools of big data.
1New developments in Hadoop, Spark and related technologies
Big data platforms are evolving rapidly, with some open source technologies getting four or more releases annually. Even Hadoop, now 10 years on from its initial development, is far from settled as a technology. This section includes stories on recent news, trends and user deployments involving Hadoop and Spark, as well as other big data technologies.
-
Article
Sellpoints buys into Spark SQL for big-data ETL workloads
Online marketing services provider Sellpoints Inc. is using Spark, including the technology's SQL programming module, to prepare incoming streams of Web activity data for analysis. Read Now
-
Article
Spark moving to a more mature stage, consultant says
Consultant Thomas W. Dinsmore sees Spark reaching a new level of maturity, one that includes more realistic assessments of the performance improvements it can provide. Read Now
-
Article
Hortonworks puts its Hadoop distribution on two 'cadences'
Hadoop vendor Hortonworks is creating separate release streams for the big data framework's core components and fast-evolving technologies such as Spark, Hive and HBase. Read Now
-
Article
Spark systems accelerate data jobs, usher MapReduce out
Several users who spoke at Spark Summit East 2016 in New York discussed their reasons for deploying Spark, including its ability to outpace MapReduce on many Hadoop batch jobs. Read Now
-
Article
Stream processing update at center of Spark 2.0 release
At this year's Spark Summit East event, Spark creator Matei Zaharia detailed what's coming in the next version of the technology, including enhancements to its Spark Streaming module. Read Now
-
Article
Spark, NoSQL databases link up for operational analytics push
Spark is primarily known for pairing up with Hadoop, but connectors that tie it to NoSQL databases are also being tapped by users looking to analyze operational data in real time. Read Now
-
Article
Hadoop co-creator on framework's past, present and future
In a Q&A as Hadoop reached an initial 10-year milestone, co-creator Doug Cutting discussed user adoption, development priorities and what things will look like in another five years. Read Now