Using big data and Hadoop 2: New version enables new applications

Last updated:March 2014

Editor's note

Early adopters of Apache Hadoop, including high-profile users such as Yahoo, Facebook and Google, had to rely on the partnership of the Hadoop Distributed File System (HDFS) and the MapReduce programming and resource management environment. Together, those technologies enabled users to process, manage and store large amounts of structured, unstructured and semi-structured data in Hadoop clusters.

But there were limitations inherent in the Hadoop-MapReduce pairing. For example, Yahoo and other users have cited issues with the first generation of Hadoop technology not being able to keep pace with the deluge of information they're collecting online because of MapReduce's batch processing format.

Hadoop 2, an upgrade released by the Apache Software Foundation in October 2013, offers performance improvements that can benefit related technologies in the Hadoop ecosystem, including the HBase database and Hive data warehouse. But the most notable addition in Hadoop 2 -- which originally was referred to as Hadoop 2.0 -- is YARN, a new component that takes over MapReduce's resource management and job scheduling duties. YARN (short for Yet Another Resource Negotiator) enables users to deploy Hadoop systems without MapReduce. Running MapReduce applications is still an option, but other kinds of programs can now be run natively as well -- for example, real-time querying and streaming data applications. The enhanced flexibility opens the door to broader uses for big data and Hadoop 2 implementations; in addition, YARN allows users to consolidate multiple Hadoop clusters into one system to lower costs and streamline management tasks. The upgrades in Hadoop 2 also boost cluster availability and scalability, two other issues that held back the first version of Hadoop.

Even with the added capabilities, Hadoop 2 still has a long way to go in moving beyond the early adopter stage, particularly in mainstream IT shops. But the new version heralds a maturing technology and a revamped concept for developing and implementing big data applications. This guide explores the features of Hadoop 2 and potential new uses for Hadoop tools and systems with insight and advice from experienced Hadoop users as well as industry analysts and consultants.

1Maximizing the potential of Hadoop 2: Opportunities and challenges

Hadoop 2 can support applications in a wider range of programming modes and data-crunching capacities. In addition, the Hadoop framework is being tapped for involvement in areas such as mainframe modernization and mobile app development. In this section, learn about new trends in the use of Hadoop and hurdles that could get in the way of the technology -- and Hadoop users.

2Weighing Hadoop 2's place in business analytics and operations

In this section, discover how Hadoop 2 supports business analytics and enterprise operations -- and get advice on what's needed to make the potential uses a reality in companies wanting to take advantage of its added functions. Consultants and experienced users discuss what Hadoop 2 has to offer and what challenges stand in the way of getting valuable business benefits from the upgraded Hadoop framework.

3How well do you know the Hadoop ecosystem?

Take this brief quiz to test what you know about the Hadoop framework.