Joshua Resnick - Fotolia
While there is plenty of Hadoop platform chatter, there is also notable buzz around Spark, an emerging analytical data framework.
As open source prodigies, Hadoop and Spark still seek better definition and greater maturity. The 2014 Strata + Hadoop World conference in New York provided a window on both works in progress, according to TechTarget editors who took part in this edition of the Talking Data podcast.
The reporters suggest that understanding the two technologies' roles -- sometimes exclusive, sometimes inclusive -- is useful for the data architect who pursues big data architectures based on commodity computing clusters.
A strong use case for Spark appears to be machine learning, a highly iterative process in which specialized algorithms repeatedly churn through masses of data, TechTarget writer Jack Vaughan told colleague Ed Burns. Spark's in-memory approach is a step forward in terms of performance, but it may have cost ramifications in some cases, according to Vaughan's sources.
Also discussed were a number of Spark-related product announcements made at Strata + Hadoop World, which mark it as a technology to watch going forward.
The Strata event also showcased enterprise user presentations describing growing experiences with Hadoop and Spark.
"Whether it is Spark or Hadoop, there is interest in these new open source technologies and the idea that a somewhat general programming approach can be applied to them for handling massive amounts of data, and for doing it on commodity clusters," Vaughan said. "Spark and Hadoop may be more related than some people think."
In fact, today, Spark is often deployed as a component running on the Hadoop 2.0 platform. In this sense, Hadoop plays a real role in enabling Spark, as it has taken some of the spotlight on the analytics stage.