Hadoop 2 definition

This definition is part of our Essential Guide: Using big data and Hadoop 2: New version enables new applications
Contributor(s): Jack Vaughan

Apache Hadoop 2 (Hadoop 2.0) is the second iteration of the Hadoop framework for distributed data processing.  

Hadoop 2 adds support for running non-batch applications through the introduction of YARN, a redesigned cluster resource manager that eliminates Hadoop's sole reliance on the MapReduce programming model. Short for Yet Another Resource Negotiator, YARN puts resource management and job scheduling functions in a separate layer beneath the data processing one, enabling Hadoop 2 to run a variety of applications. Overall, the changes made in Hadoop 2 position the framework for wider use in big data analytics and other enterprise applications. For example, it is now possible to run event processing as well as streaming, real-time and operational applications. The capability to support programming frameworks other than MapReduce also means that Hadoop can serve as a platform for a wider variety of analytical applications.

Hadoop 2 also includes new features designed to improve system availability and scalability. For example, it introduced an Hadoop Distributed File System (HDFS) high-availability (HA) feature that brings a new NameNode architecture to Hadoop. Previously, Hadoop clusters had one NameNode that maintained a directory tree of HDFS files and tracked where data was stored in a cluster. The Hadoop 2 high-availability scheme allows users to configure clusters with redundant NameNodes, removing the chance that a lone NameNode will become a single point of failure (SPoF) within a cluster. Meanwhile, a new HDFS federation capability lets clusters be built out horizontally with multiple NameNodes that work independently but share a common data storage pool, offering better compute scaling as compared to Apache Hadoop 1.x.

Hadoop 2 also added support for Microsoft Windows and a snapshot capability that makes read-only point-in-time copies of a file system available for data backup and disaster recovery (DR). In addition, the revision offers all-important binary compatibility with existing MapReduce applications built for Hadoop 1.x releases.

This was first published in January 2014

Continue Reading About Hadoop 2

PRO+

Content

Find more PRO+ content and other member only offers, here.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSOA

SearchSQLServer

Close