Managing Hadoop projects: What you need to know to succeed

Last updated:February 2014

Editor's note

Companies that need to process large and varied data sets frequently look to Apache Hadoop as a potential tool because it offers the ability to process, store and manage huge amounts of structured, unstructured and semi-structured data. The open source Hadoop framework is built on top of a distributed file system and a cluster architecture that enable it to rapidly ingest and process data for use in analytics applications; Hadoop clusters can also be easily scaled up by adding compute nodes based on commodity servers. But Hadoop isn't a cure-all system for big data application needs as a whole. And while big-name Internet companies like Yahoo, Facebook, Twitter and eBay are prominent users of the technology, and other leading-edge users are also taking advantage of it, Hadoop projects are new undertakings for many organizations.

Some industry analysts assert that Hadoop is still in its adolescent stages, a technology that requires more maturity and functionality before it's fully enterprise-ready. The release of the second-generation Hadoop 2 software in October 2013 added broader application support and features designed to improve cluster availability and scalability. Even so, Hadoop adoption remains relatively low. For example, only 10% of the 284 respondents to a 2015 Gartner survey said their organizations were using it in production applications; another 16% said they were running pilot projects or experimenting with Hadoop, but 54% had no plans to use the technology.

Although Hadoop is open source software, it's by no means free. Companies implementing a Hadoop cluster generally choose one of the commercial distributions of the big data framework, which poses maintenance and support costs. Typically, it also must be used in coordination with a range of complementary technologies from what is referred to as the Hadoop ecosystem. As a result, prospective users have to hire experienced programmers or train existing employees on working not only with Hadoop, but also with MapReduce and related technologies such as Hive, HBase, Spark and Pig.

For many people, big data deployments and Hadoop projects are one and the same. That isn't the case, but Hadoop does have a central role to play in many big data management and analytics initiatives. Learn more about the Hadoop framework in this guide, which offers different perspectives on Hadoop's capabilities and looks at the technology's ongoing development, how it can benefit users and where it doesn't fully measure up to IT needs.

1Keeping up with Hadoop news and trends

As with other technologies, Hadoop is continually evolving to meet shifting big data management needs and business goals. The articles in this section catalog Hadoop technology trends, offering a look at new functionality, expanding applications and supporting tools in the Hadoop ecosystem.

2Examining issues and weaknesses in the Hadoop ecosystem

While many users find Hadoop projects to be cost-effective and useful, they have some drawbacks to keep in mind in assessing whether it's the right technology for an organization. In this section, users and analysts discuss where Hadoop falls short, particularly in terms of real costs, ease of management, performance and overall capability, and offer advice on how to avoid problems on deployments.

3Analysis of Hadoop and big data technologies

Watch the video interviews in this section for analyses and insights into the issues involved in evaluating, deploying and managing Hadoop tools and big data technologies. Well-known consultants and industry analysts share tips on adoption of Hadoop and other big data tools and on how to implement successful big data management and analytics programs.

4Test your understanding of the Hadoop ecosystem

Take this brief quiz to see what you have learned about Hadoop.