Conference Coverage

News Stay informed about the latest enterprise technology news and product updates.

Strata + Hadoop World 2016: Hadoop and Spark in spotlight

A roundup of news, trends and analysis from this year's Strata + Hadoop World conference and previous editions of the event, which focuses on Hadoop, Spark and other big data technologies.


Hadoop and the Spark data processing engine now share the spotlight at Strata + Hadoop World, which focuses on big data management and analytics technologies. And the Strata + Hadoop World 2016 conferences held in the U.S., one in the spring and one in the fall, took place at a noteworthy time in the evolution of both Hadoop and Spark.

Hadoop turns 10 this year, at least by some measures. For example, The Apache Software Foundation created a separate open source subproject for managing Hadoop development in January 2006, and the first public release of Hadoop code followed that April. Now the distributed processing framework is at something of a crossroads, as its original core components get augmented -- and perhaps supplanted -- by other technologies. In particular, MapReduce, initially Hadoop's all-in-one cluster resource manager and programming and processing environment, is being shunted aside by a combination of Spark, SQL-on-Hadoop query engines and the YARN resource management platform underpinning Hadoop 2. Users also have alternatives to the Hadoop Distributed File System (HDFS) for storing data -- for example, the Kudu columnar data store that Hadoop distribution market leader Cloudera Inc. introduced at the 2015 Strata conference in New York for use in real-time analytics applications involving streaming data.

Meanwhile, a 2.0 version of the Apache Spark open source software was released in July 2016 with updates to its stream processing, machine learning and Spark SQL modules, plus a promised performance boost. Hadoop and Spark are often paired together in deployments, with the latter being used to accelerate the processing of data stored in the former. But Spark can also run on its own against data in other platforms, such as NoSQL database systems or cloud-based Amazon Simple Storage Service implementations. Spark proponents and some industry analysts foresee a possible future in which today's Hadoop ecosystem is joined by one surrounding Spark.

New Spark and Hadoop developments, and the relationship between the two technologies, were prominent discussion topics at Strata + Hadoop World 2016 in both San Jose, Calif., in March and New York in September. Numerous presentations on big data trends and best practices for managing deployments were also in the spotlight at the conferences, which were jointly organized by Cloudera and O'Reilly Media Inc. In the sections below, you'll find our coverage of the two events, plus stories from last year's Strata conferences and other content on Hadoop, Spark and related technologies for managing and analyzing pools of big data.

1Strata conference stories-

Reporting from Strata + Hadoop World 2016 and 2015

Strata + Hadoop World includes a mix of sessions featuring IT managers, data scientists and data engineers from user organizations, as well as CTOs, software developers and other representatives from big data vendors. This section compiles news, trend and feature stories based on presentations and interviews at the 2015 and 2016 Strata conferences.


Big data analytics get a boost with real-time streaming

Comcast Corp. and other companies are turning to real-time processing and analytics technologies like Apache Kudu to help find useful information in streams of big data. Continue Reading


IT vendors aim to make it easier to move to Hadoop Cloud

As user continue to shift big data management and analytics apps to the cloud, vendors are working to ease the process -- and lower the price -- of migrating Hadoop operations. Continue Reading


Hadoop co-creator looks back, and ahead

Doug Cutting chats with SearchDataManagement at Strata + Hadoop World about Hadoop's origins, and offers some advice developers can consider for the next decade. Continue Reading


Data streaming takes center stage at Strata Hadoop 2016

Hadoop-based applications are increasingly including Spark Streaming, Kafka and other components that enable real-time streaming analytics capabilities. Find out why. Continue Reading


Tips for finding, keeping data scientists

Experts at Strata + Hadoop World 2016 discussed how to build up and lead data science teams. Continue Reading


Is there a place for Hadoop core technology in big data's future?

Will Spark replace MapReduce and diminish the role of the Hadoop framework in the enterprise? Hadoop co-creator Doug Cutting weighed in at Strata + Hadoop World 2016. Continue Reading


Hadoop's changing identity may leave MapReduce behind

Initially a critical cog in Hadoop clusters, MapReduce is being reduced in stature by newer technologies -- a sense that was palpable at Strata + Hadoop World 2015 in New York. Continue Reading


Is Spark a companion to Hadoop -- or a potential competitor?

We asked attendees at the 2015 Strata conference in New York whether they see the Spark processing engine more as a complement to Hadoop or a possible alternative to it. Continue Reading


Hadoop vendors expand their horizons beyond core components

Hadoop distribution vendors Cloudera and Hortonworks announced new technologies that look beyond the Hadoop Distributed File System as a data store for some applications. Continue Reading


Wal-Mart taps Hadoop to help power new online applications

In a Strata session, the CTO of the retail giant's unit detailed its use of a Hadoop-based repository to drive several applications mixing online and in-store data. Continue Reading


Quiet down and think more creatively, Strata speaker says

The Strata event usually focuses on what users can do with big data technologies. But one speaker in New York recommended disconnecting at times to do some creative thinking. Continue Reading


Top Hadoop rivals spar over planned interoperability effort

A plan by some Hadoop vendors to create an Open Data Platform initiative sparked competing claims between them and nonparticipants at Strata + Hadoop World 2015 in San Jose. Continue Reading


Big data platforms give machine learning initiatives a boost

Strata attendees said Hadoop, Spark and other big data analytics and management tools are helping to ease processing limitations that have held back machine learning applications. Continue Reading


Legal departments get a say in customer analytics programs

Some companies are involving corporate lawyers in analytics efforts to ensure that sensitive customer data isn't abused. But simply locking down data can hamstring data analysts. Continue Reading


Data storytelling becomes a sought-after analytics skill

At Strata 2015 in San Jose, analytics exec Pamela Peele said one of the key people on her team is a former journalist hired to help communicate analytical findings to business managers. Continue Reading

2Big data trends-

New developments in Hadoop, Spark and related technologies

Big data platforms are evolving rapidly, with some open source technologies getting four or more releases annually. Even Hadoop, now 10 years on from its initial development, is far from settled as a technology. This section includes stories on recent news, trends and user deployments involving Hadoop and Spark, as well as other big data technologies.


Sellpoints buys into Spark SQL for big-data ETL workloads

Online marketing services provider Sellpoints Inc. is using Spark, including the technology's SQL programming module, to prepare incoming streams of Web activity data for analysis. Continue Reading


Spark moving to a more mature stage, consultant says

Consultant Thomas W. Dinsmore sees Spark reaching a new level of maturity, one that includes more realistic assessments of the performance improvements it can provide. Continue Reading


Hortonworks puts its Hadoop distribution on two 'cadences'

Hadoop vendor Hortonworks is creating separate release streams for the big data framework's core components and fast-evolving technologies such as Spark, Hive and HBase. Continue Reading


Spark systems accelerate data jobs, usher MapReduce out

Several users who spoke at Spark Summit East 2016 in New York discussed their reasons for deploying Spark, including its ability to outpace MapReduce on many Hadoop batch jobs. Continue Reading


Stream processing update at center of Spark 2.0 release

At this year's Spark Summit East event, Spark creator Matei Zaharia detailed what's coming in the next version of the technology, including enhancements to its Spark Streaming module. Continue Reading


SQL-on-Hadoop tools offer gateway to broader Hadoop use

Emerging SQL-on-Hadoop query engines open up data in Hadoop to the legions of programmers with SQL skills, potentially enabling increased Hadoop adoption by organizations. Continue Reading


Spark, NoSQL databases link up for operational analytics push

Spark is primarily known for pairing up with Hadoop, but connectors that tie it to NoSQL databases are also being tapped by users looking to analyze operational data in real time. Continue Reading


Hadoop co-creator on framework's past, present and future

In a Q&A as Hadoop reached an initial 10-year milestone, co-creator Doug Cutting discussed user adoption, development priorities and what things will look like in another five years. Continue Reading

3Big data definitions-

Terms you'll hear at Strata + Hadoop World 2016

Read the definitions included in this section to learn more about big data technologies, techniques and processes.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.