- Share this item with your network:
- Download
Business Information
- FeatureHR leader's passion for nonprofit spurs HCM system upgrade
- FeatureApache Spark trumps MapReduce in speed, flexibility
- FeatureBig data challenges include what info to use -- and what not to
- TipLook to business needs in deciding what big data sets to analyze
- TipTips on building big data, advanced analytics programs
- FeatureDip in Hadoop data lake can be bracing for big data users
- TipHadoop data lake not a place for just lounging around
- OpinionApproach big data projects with care, purpose
- OpinionTailor your big data strategy with role-specific analytics

carloscastilla - Fotolia
Apache Spark trumps MapReduce in speed, flexibility
The processing engine’s in-memory computing layer can supposedly run batch-processing programs 100 times faster. Does the vendor hype equal user adoption?
Apache Spark is an open source data processing engine that emerged from the labs at the University of California, Berkeley, in 2010 and burst onto the big data scene in a big way last year. The Apache Software Foundation released Version 1.0.0 of Spark last May, and big data vendors have lit a marketing fire under the technology, touting it as a faster and more flexible alternative to MapReduce for processing and analyzing Hadoop data.
The buzz: Spark addresses some of the shortcomings of MapReduce, Hadoop's original processing engine. At Spark's heart is an in-memory computing layer that proponents say can run batch-processing programs up to 100 times faster than MapReduce can. Spark also is a more general-purpose technology that's suited to machine learning, streaming, graph processing and SQL querying applications in addition to batch jobs. And it uses high-level APIs and libraries, making application development easier than it is with greasy and grimy MapReduce.
The reality: Thus far, Spark has gotten far more vendor hype than user adoption. And it has plenty of maturing to do. For example, tools that connect it to SQL are very new. Also, its in-memory capabilities may prove to be expensive for some uses. And while its APIs are less complex than MapReduce's, they're beyond the ken of most enterprise developers. It’s still possible that Spark could flame out instead of burning brightly.

Next Steps
What makes integrating SAP HANA with Apache Spark different
Company vies with the likes of Amazon by adding Apache Spark to its portfolio
Apache Spark looks to improve on MapReduce performance
Discover more about Spark vs. Hadoop.