Essential Guide

Managing Hadoop projects: What you need to know to succeed

Learn valuable information about the Hadoop ecosystem and framework in this guide -- its capabilities, its limitations and its place in a big data management and analytics architecture.


Early adopters of Apache Hadoop, including high-profile users such as Yahoo, Facebook and Google, had to rely on the partnership of the Hadoop Distributed File System (HDFS) and the MapReduce programming and resource management environment. Together, those technologies enabled users to process, manage and store large amounts of structured, unstructured and semi-structured data in Hadoop clusters.

But there were limitations inherent in the Hadoop-MapReduce pairing. For example, Yahoo and other users have cited issues with the first generation of Hadoop technology not being able to keep pace with the deluge of information they're collecting online because of MapReduce's batch processing format.

Hadoop 2, an upgrade released by the Apache Software Foundation in October 2013, offers performance improvements that can benefit related technologies in the Hadoop ecosystem, including the HBase database and Hive data warehouse. But the most notable addition in Hadoop 2 -- which originally was referred to as Hadoop 2.0 -- is YARN, a new component that takes over MapReduce's resource management and job scheduling duties. YARN (short for Yet Another Resource Negotiator) enables users to deploy Hadoop systems without MapReduce. Running MapReduce applications is still an option, but other kinds of programs can now be run natively as well -- for example, real-time querying and streaming data applications. The enhanced flexibility opens the door to broader uses for big data and Hadoop projects; in addition, YARN allows users to consolidate multiple Hadoop clusters into one system to lower costs and streamline management tasks. The upgrades in Hadoop 2 also boost cluster availability and scalability, two other issues that held back the first version of the framework.

Even with the added capabilities, Hadoop 2 still has a long way to go in moving beyond the early adopter stage, particularly in mainstream IT shops. But the new version heralds a maturing technology and a revamped concept for developing and implementing big data applications. This guide explores the features of Hadoop 2 and potential new uses for Hadoop tools and systems, with insight and advice from experienced users as well as industry analysts and consultants.

1Understanding and using Hadoop-

Elucidating benefits, myths and facts about Hadoop

Before deciding to implement the Hadoop framework as a tool for managing and analyzing big data, IT decision makers should understand exactly what Hadoop is and how it works. In the articles in this section, experienced users and industry analysts discuss the potential benefits of Hadoop projects, dispel myths surrounding the technology and explore how using Hadoop clusters can generate a return on investment for organizations.


Deploying Hadoop clusters is new territory for many IT teams

Hadoop projects can present managers with unfamiliar challenges -- particularly because many organizations still lack experience with the framework. Continue Reading

Feature groups team up on Hadoop system deployment

Data scientists and software engineers at genealogy information provider worked together to employ Hadoop to support a DNA matching application. Continue Reading


Security services provider uses Hadoop, HBase to ease data flood

Solutionary Inc. is using a system based on Hadoop and its companion HBase database to help manage log data that it combs through in an effort to detect network security threats. Continue Reading


In enterprise apps, Hadoop needs to play nice with existing processes

In a panel discussion at the Hadoop Summit 2013, several enterprise users offered advice on putting Hadoop systems into action in real business applications. Continue Reading


Hadoop projects: Bringing big data into data warehouse environments

Hadoop helps IT teams looking to efficiently mix pools of big data with the information stored in enterprise data warehouses. Continue Reading


Twelve common Hadoop myths debunked

The Data Warehousing Institute's Philip Russom deconstructs a dozen misconceptions about Hadoop and gives his take on the realities of deploying and managing it. Continue Reading


What's the interest in Hadoop all about?

Consultant Tom Nolle examines the real reasons why so many people are interested in Hadoop -- and what's required to successfully implement the technology. Continue Reading

2Hadoop's ongoing evolution-

Keeping up with Hadoop news and trends

As with other technologies, Hadoop is continually evolving to meet shifting big data management needs and business goals. The articles in this section catalog Hadoop technology trends, offering a look at new functionality, expanding applications and supporting tools in the Hadoop ecosystem.


Hadoop 2 expands potential applications -- and issues to weigh

The first version of Hadoop was primarily limited to running MapReduce batch-processing jobs. Hadoop 2 supports more applications, but users still face deployment challenges. Continue Reading


Role eyed for Hadoop in modernization, migration of mainframe apps

Hadoop systems are provoking changes in traditional data warehouse environments. And they might also alter the status quo for mainframe modernization and migration efforts. Continue Reading


Searching for the 'unknown unknowns' with Hadoop and Lucene

Hadoop and the open Lucene search engine are increasingly being paired up by software vendors in an effort to improve users' ability to search for information in pools of big data. Continue Reading


Storm-YARN pairing points to a Hadoop development schism

Yahoo's combination of Hadoop's YARN resource manager and the Storm event processor highlights the gap between enterprise needs and those of large Internet companies. Continue Reading


What's driving the rising importance of Hadoop management tools

The need for tools that can help manage Hadoop clusters is increasing as more users move beyond experiments and deploy the open source framework in real applications. Continue Reading


New enterprise features needed as Hadoop use expands

Hadoop vendors are trying to clear the way for increased adoption of the technology by offering add-ons that target issues keeping some corporate users from moving forward. Continue Reading


How many Hadoop distributions does the world need?

EMC and Intel entered the Hadoop ring by releasing distributions of the software in 2013, increasing the number of choices for prospective users to contend with. Continue Reading


Implementing Hadoop: Storage considerations

Get advice on the issues to take into account in deciding which types of storage systems to use as the primary storage layer for Hadoop data. Continue Reading


Hadoop tools taking on higher-level tasks

It's one thing to build a Hadoop cluster, but making it into a useful system for big data management and analytics requires a greater investment of time and resources. Continue Reading

3Hadoop issues and shortcomings-

Examining issues and weaknesses in the Hadoop ecosystem

While many users find Hadoop projects to be cost-effective and useful, they have some drawbacks to keep in mind in assessing whether it's the right technology for an organization. In this section, users and analysts discuss where Hadoop falls short, particularly in terms of real costs, ease of management, performance and overall capability, and offer advice on how to avoid problems on deployments.


Hadoop still not up to handling real-time analytics?

Software vendors are adding query engines that run on top of Hadoop in an effort to turn it into a real-time data analysis platform. But some roadblocks remain. Continue Reading


Big data without bottlenecks: Avoiding Hadoop throughput snafus

While various issues can bog down the performance of Hadoop systems, there are ways to steer clear of the pitfalls and ensure that your big data applications keep cruising along. Continue Reading


When Hadoop is the right technology to use -- and when it's not

There's no shortage of hoopla about Hadoop, but it isn't the answer to all big data application needs. Smart companies need to make sure it's a good match for their requirements. Continue Reading


Facing up to issues with MapReduce and Hadoop

Big data users can't wish away the challenges of deploying systems based on Hadoop and MapReduce. But taking some good first steps helps minimize the difficulties. Continue Reading


Hadoop data in motion adds challenges for operational BI uses

Developers and data architects building operational business intelligence applications may need to create fast messaging infrastructures to handle streams of Hadoop data. Continue Reading


Analytics users find pros and cons with Hadoop

There are potential advantages to using Hadoop in analytics applications, but it also can pose some hardships that prospective users should be aware of up front. Continue Reading


Big data projects might require more than Hadoop to succeed

Hadoop isn't a magic bullet for meeting big data needs, says Gartner analyst Doug Laney, who offers advice on how to reap the benefits of big data investments. Continue Reading


Panelists discuss pitfalls of Hadoop, other big data technologies

A panel of technology vendors and analysts weighs in on the upsides and downsides of technologies such as Hadoop and MapReduce. Continue Reading


Analysis of Hadoop and big data technologies

Watch the video interviews in this section for analyses and insights into the issues involved in evaluating, deploying and managing Hadoop tools and big data technologies. Well-known consultants and industry analysts share tips on adoption of Hadoop and other big data tools and on how to implement successful big data management and analytics programs.


White: Don't get misled by hype in selecting a Hadoop platform

Consultant Colin White discusses the maturity of Hadoop tools and details some of the key issues to consider when evaluating Hadoop distributions.


Eckerson: Hadoop can bring big data ROI for users

TechTarget analyst Wayne Eckerson discusses the potential benefits and challenges of using Hadoop systems to run big data applications.


Rogers: Skills shortage impedes adoption of big data tools

Shawn Rogers of Enterprise Management Associates explains a common roadblock to adoption of big data systems and technologies -- a lack of big data skills in organizations.


McKnight: Getting started with big data analytics

Consultant William McKnight discusses big data basics, offering practical advice on key issues related to big data management and analysis.


Eckerson: Tips on using big data analytics software and tools

Wayne Eckerson offers advice on using big data analytics technology and shares his view of the big data big picture.


Glossary of Hadoop-related terminology

This glossary offers definitions of key terms pertinent to Hadoop projects and big data initiatives.

6Hadoop quiz-

Test your understanding of the Hadoop ecosystem

Take this brief quiz to see what you have learned about Hadoop.

Take This Quiz