PRO+ Premium Content/Business Information

Thank you for joining!
Access your Pro+ Content below.
December 2015, Volume 3, Number 6

MapReduce(d) in the eyes of many Hadoop systems users

Poor MapReduce. Until late 2013, it was a critical cog in all Hadoop systems, serving as both the cluster resource manager and the primary programming and processing environment for the open source big data framework. But then things started to change. The Apache Software Foundation's Hadoop 2 release added a new technology called YARN that usurped the resource management role and opened up Hadoop to applications other than MapReduce batch jobs. A still-growing gaggle of vendors rolled out SQL-on-Hadoop tools that let users write analytical queries against Hadoop data in standard SQL instead of MapReduce. And the Spark processing engine burst onto the scene, with proponents claiming it can run batch jobs up to 100 times faster than MapReduce, while supporting higher-level programming in popular languages such as Java and Python. With all those forces arrayed against it, MapReduce has been, er, reduced in stature -- like an old steam engine being forced to give way to sleeker diesel locomotives. That sense was palpable at the ...

Features in this issue

Columns in this issue