carloscastilla - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Apache Spark trumps MapReduce in speed, flexibility

The processing engine’s in-memory computing layer can supposedly run batch-processing programs 100 times faster. Does the vendor hype equal user adoption?

This article can also be found in the Premium Editorial Download: Business Information: Launching big data initiatives? Be choosy about the data:

Apache Spark is an open source data processing engine that emerged from the labs at the University of California, Berkeley, in 2010 and burst onto the big data scene in a big way last year. The Apache Software Foundation released Version 1.0.0 of Spark last May, and big data vendors have lit a marketing fire under the technology, touting it as a faster and more flexible alternative to MapReduce for processing and analyzing Hadoop data.

The buzz: Spark addresses some of the shortcomings of MapReduce, Hadoop's original processing engine. At Spark's heart is an in-memory computing layer that proponents say can run batch-processing programs up to 100 times faster than MapReduce can. Spark also is a more general-purpose technology that's suited to machine learning, streaming, graph processing and SQL querying applications in addition to batch jobs. And it uses high-level APIs and libraries, making application development easier than it is with greasy and grimy MapReduce.

The reality: Thus far, Spark has gotten far more vendor hype than user adoption. And it has plenty of maturing to do. For example, tools that connect it to SQL are very new. Also, its in-memory capabilities may prove to be expensive for some uses. And while its APIs are less complex than MapReduce's, they're beyond the ken of most enterprise developers. It’s still possible that Spark could flame out instead of burning brightly.

Lighting a spark timeline

Next Steps

What makes integrating SAP HANA with Apache Spark different

Company vies with the likes of Amazon by adding Apache Spark to its portfolio

Apache Spark looks to improve on MapReduce performance

Discover more about Spark vs. Hadoop.

This was last published in February 2015

Dig Deeper on Big data management



Find more PRO+ content and other member only offers, here.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Seeing as it's still fairly new, I'd love to read about some real world experiences. Things like the use of SQL and any learning curve to adopt this new method.