This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
4. - Brush up on your Hadoop vocabulary: Read more in this section
Explore other sections in this guide:
Apache Hadoop 2 (Hadoop 2.0) is the second iteration of the Hadoop framework for distributed data processing.
Hadoop 2 adds support for running non-batch applications through the introduction of YARN, a redesigned cluster resource manager that eliminates Hadoop's sole reliance on the MapReduce programming model. Short for Yet Another Resource Negotiator, YARN puts resource management and job scheduling functions in a separate layer beneath the data processing one, enabling Hadoop 2 to run a variety of applications. Overall, the changes made in Hadoop 2 position the framework for wider use in big data analytics and other enterprise applications. For example, it is now possible to run event processing as well as streaming, real-time and operational applications. The capability to support programming frameworks other than MapReduce also means that Hadoop can serve as a platform for a wider variety of analytical applications.
Hadoop 2 also includes new features designed to improve system availability and scalability. For example, it introduced an Hadoop Distributed File System (HDFS) high-availability (HA) feature that brings a new NameNode architecture to Hadoop. Previously, Hadoop clusters had one NameNode that maintained a directory tree of HDFS files and tracked where data was stored in a cluster. The Hadoop 2 high-availability scheme allows users to configure clusters with redundant NameNodes, removing the chance that a lone NameNode will become a single point of failure (SPoF) within a cluster. Meanwhile, a new HDFS federation capability lets clusters be built out horizontally with multiple NameNodes that work independently but share a common data storage pool, offering better compute scaling as compared to Apache Hadoop 1.x.
Hadoop 2 also added support for Microsoft Windows and a snapshot capability that makes read-only point-in-time copies of a file system available for data backup and disaster recovery (DR). In addition, the revision offers all-important binary compatibility with existing MapReduce applications built for Hadoop 1.x releases.