carloscastilla - Fotolia
Apache Spark is an open source data processing engine that emerged from the labs at the University of California, Berkeley, in 2010 and burst onto the big data scene in a big way last year. The Apache Software Foundation released Version 1.0.0 of Spark last May, and big data vendors have lit a marketing fire under the technology, touting it as a faster and more flexible alternative to MapReduce for processing and analyzing Hadoop data.
The buzz: Spark addresses some of the shortcomings of MapReduce, Hadoop's original processing engine. At Spark's heart is an in-memory computing layer that proponents say can run batch-processing programs up to 100 times faster than MapReduce can. Spark also is a more general-purpose technology that's suited to machine learning, streaming, graph processing and SQL querying applications in addition to batch jobs. And it uses high-level APIs and libraries, making application development easier than it is with greasy and grimy MapReduce.
The reality: Thus far, Spark has gotten far more vendor hype than user adoption. And it has plenty of maturing to do. For example, tools that connect it to SQL are very new. Also, its in-memory capabilities may prove to be expensive for some uses. And while its APIs are less complex than MapReduce's, they're beyond the ken of most enterprise developers. It’s still possible that Spark could flame out instead of burning brightly.
What makes integrating SAP HANA with Apache Spark different
Company vies with the likes of Amazon by adding Apache Spark to its portfolio
Apache Spark looks to improve on MapReduce performance
Discover more about Spark vs. Hadoop.
- Riding the elephant: how to manage big data –ComputerWeekly.com
- Linking Master Data Management to Big Data –IBM
- Big Data in Big Companies –ComputerWeekly.com
- The Key to Managing Big Data: A Look at Pentaho Data Integration –Hitachi Data Systems