Hadoop 2

This definition is part of our Essential Guide: Using big data platforms for data management, access and analytics
Contributor(s): Jack Vaughan

Apache Hadoop 2 (Hadoop 2.0) is the second iteration of the Hadoop framework for distributed data processing.  

Hadoop 2 adds support for running non-batch applications through the introduction of YARN, a redesigned cluster resource manager that eliminates Hadoop's sole reliance on the MapReduce programming model. Short for Yet Another Resource Negotiator, YARN puts resource management and job scheduling functions in a separate layer beneath the data processing one, enabling Hadoop 2 to run a variety of applications. Overall, the changes made in Hadoop 2 position the framework for wider use in big data analytics and other enterprise applications. For example, it is now possible to run event processing as well as streaming, real-time and operational applications. The capability to support programming frameworks other than MapReduce also means that Hadoop can serve as a platform for a wider variety of analytical applications.

Hadoop 2 also includes new features designed to improve system availability and scalability. For example, it introduced an Hadoop Distributed File System (HDFS) high-availability (HA) feature that brings a new NameNode architecture to Hadoop. Previously, Hadoop clusters had one NameNode that maintained a directory tree of HDFS files and tracked where data was stored in a cluster. The Hadoop 2 high-availability scheme allows users to configure clusters with redundant NameNodes, removing the chance that a lone NameNode will become a single point of failure (SPoF) within a cluster. Meanwhile, a new HDFS federation capability lets clusters be built out horizontally with multiple NameNodes that work independently but share a common data storage pool, offering better compute scaling as compared to Apache Hadoop 1.x.

Hadoop 2 also added support for Microsoft Windows and a snapshot capability that makes read-only point-in-time copies of a file system available for data backup and disaster recovery (DR). In addition, the revision offers all-important binary compatibility with existing MapReduce applications built for Hadoop 1.x releases.

This was last updated in January 2014

Continue Reading About Hadoop 2



Find more PRO+ content and other member only offers, here.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.


File Extensions and File Formats

Powered by: