PRO+ Premium Content/Business Information

Thank you for joining!
Access your Pro+ Content below.
February 2015, Volume 3, Number 1

Apache Spark trumps MapReduce in speed, flexibility

Apache Spark is an open source data processing engine that emerged from the labs at the University of California, Berkeley, in 2010 and burst onto the big data scene in a big way last year. The Apache Software Foundation released Version 1.0.0 of Spark last May, and big data vendors have lit a marketing fire under the technology, touting it as a faster and more flexible alternative to MapReduce for processing and analyzing Hadoop data. The buzz: Spark addresses some of the shortcomings of MapReduce, Hadoop's original processing engine. At Spark's heart is an in-memory computing layer that proponents say can run batch-processing programs up to 100 times faster than MapReduce can. Spark also is a more general-purpose technology that's suited to machine learning, streaming, graph processing and SQL querying applications in addition to batch jobs. And it uses high-level APIs and libraries, making application development easier than it is with greasy and grimy MapReduce. The reality: Thus far, Spark has gotten far more vendor hype ...

Features in this issue

Columns in this issue