Apache Hadoop 2 (Hadoop 2.0) is the second iteration of the Hadoop framework for distributed data processing.
Hadoop 2 adds support for running non-batch applications through the introduction of YARN, a redesigned cluster resource manager that eliminates Hadoop's sole reliance on the MapReduce programming model. Short for Yet Another Resource Negotiator, YARN puts resource management and job scheduling functions in a separate layer beneath the data processing one, enabling Hadoop 2 to run a variety of applications. Overall, the changes made in Hadoop 2 position the framework for wider use in big data analytics and other enterprise applications. For example, it is now possible to run event processing as well as streaming, real-time and operational applications. The capability to support programming frameworks other than MapReduce also means that Hadoop can serve as a platform for a wider variety of analytical applications.
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
Hadoop 2 also includes new features designed to improve system availability and scalability. For example, it introduced an Hadoop Distributed File System (HDFS) high-availability (HA) feature that brings a new NameNode architecture to Hadoop. Previously, Hadoop clusters had one NameNode that maintained a directory tree of HDFS files and tracked where data was stored in a cluster. The Hadoop 2 high-availability scheme allows users to configure clusters with redundant NameNodes, removing the chance that a lone NameNode will become a single point of failure (SPoF) within a cluster. Meanwhile, a new HDFS federation capability lets clusters be built out horizontally with multiple NameNodes that work independently but share a common data storage pool, offering better compute scaling as compared to Apache Hadoop 1.x.
Hadoop 2 also added support for Microsoft Windows and a snapshot capability that makes read-only point-in-time copies of a file system available for data backup and disaster recovery (DR). In addition, the revision offers all-important binary compatibility with existing MapReduce applications built for Hadoop 1.x releases.