Essential Guide

This Essential Guide is a collection of articles, videos and other content selected by our editors to give you a comprehensive view of this topic.

Managing Hadoop projects: What you need to know to succeed

Learn valuable information about the Hadoop ecosystem and framework in this guide -- its capabilities, its limitations and its place in a big data management and analytics architecture.

Companies that need to process large and varied data sets frequently look to Apache Hadoop as a potential tool, because it offers the ability to process, store and manage huge amounts of both structured and unstructured data. The open source Hadoop framework is built on top of a distributed file system and a cluster architecture that enable it to transfer data rapidly and continue operating even if one or more compute nodes fail. But Hadoop isn't a cure-all system for big data application needs as a whole. And while big-name Internet companies like Yahoo, Facebook, Twitter, eBay and Google are prominent users of the technology, Hadoop projects are new undertakings for many other types of organizations.

Some industry analysts assert that Hadoop is still in its adolescent stages, a technology that requires more development and careful examination by prospective users. The release of the second-generation Hadoop 2 software in October 2013 added broader application support and features designed to improve cluster availability and scalability. Even so, Hadoop typically isn't a one-stop-shopping product and must be used in coordination with MapReduce and a range of other complementary technologies from what is referred to as the Hadoop ecosystem.

Although it's open source, it's by no means free. Companies implementing a Hadoop cluster generally choose one of the commercial distributions of the framework, which poses maintenance and support costs, and they need to pay for hardware and hire experienced programmers or train existing employees on working with Hadoop, MapReduce and related technologies such as Hive, HBase and Pig.

For many people, big data deployments and Hadoop projects are one and the same. That isn't the case, but Hadoop clearly has a central role to play in big data management and analytics initiatives. Learn more about the Hadoop framework in this guide, which offers different perspectives on Hadoop and explains how the technology can be helpful, where it doesn't measure up and why it isn't going away any time soon.

Understanding and using Hadoop

1. Elucidating benefits, myths and facts about Hadoop

Before deciding to implement the Hadoop framework as a tool for managing and analyzing big data, IT decision makers should understand exactly what Hadoop is and how it works. In the articles in this section, experienced users and industry analysts discuss the potential benefits of Hadoop projects, dispel myths surrounding the technology and explore how using Hadoop clusters can generate a return on investment for organizations.

Hadoop's ongoing evolution

2. Keeping up with Hadoop news and trends

As with other technologies, Hadoop is continually evolving to meet shifting big data management needs and business goals. The articles in this section catalog Hadoop technology trends, offering a look at new functionality, expanding applications and supporting tools in the Hadoop ecosystem.

Hadoop issues and shortcomings

3. Examining issues and weaknesses in the Hadoop ecosystem

While many users find Hadoop projects to be cost-effective and useful, they have some drawbacks to keep in mind in assessing whether it's the right technology for an organization. In this section, users and analysts discuss where Hadoop falls short, particularly in terms of real costs, ease of management, performance and overall capability, and offer advice on how to avoid problems on deployments.


4. Analysis of Hadoop and big data technologies

Watch the video interviews in this section for analyses and insights into the issues involved in evaluating, deploying and managing Hadoop tools and big data technologies. Well-known consultants and industry analysts share tips on adoption of Hadoop and other big data tools and on how to implement successful big data management and analytics programs.


5. Glossary of Hadoop-related terminology

This glossary offers definitions of key terms pertinent to Hadoop projects and big data initiatives.

Hadoop quiz

6. Test your understanding of the Hadoop ecosystem

Take this brief quiz to see what you have learned about Hadoop.

Quiz: Test your understanding of the Hadoop ecosystem