BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Exploring Hadoop distributions for managing big data
Companies of all sizes can use Hadoop, as vendors sell packages that bundle Hadoop distributions with different levels of support, as well as enhanced commercial distributions. Read Now
Companies of all sizes can use Apache Hadoop to manage the massive volumes of structured, semi-structured and unstructured data being generated by such sources as social media, Internet of Things devices and mobile sensors. The Hadoop framework comprises several open source software components with a set of core modules that capture, process, manage and analyze big data.
Although developers can download Hadoop directly from the Apache website and build an environment on their own, the open source Hadoop framework is limited. Organizations that need more robust features, maintenance and support are turning to commercial Hadoop software distributions.
Vendors bundle their enterprise Hadoop distributions with different levels of support, as well as enhanced commercial distributions. Because the software is open source, you don't purchase a Hadoop distribution as a product, but rather as an annual support subscription.
In this buyer's guide, we outline the ways commercial Hadoop distributions can benefit your organization as well as the features these offerings provide. We also analyze the top Hadoop distributions, examining key characteristics of each, including the deployment model, data protection, security and support. To help you further narrow your search, we provide in-depth descriptions of the six leading subscriptions.
1Making a case for a Hadoop software distribution
To help you determine if one of the commercial Hadoop distributions is right for your organization, you must first determine what applications you need to support.
2Hadoop distributions offer value-added functionality
Expert David Loshin explores some value-added supplements to the code base and key features offered by commercial Hadoop distributions, including performance and functionality capabilities, maintenance, and support.
3Which Hadoop distribution is right for my organization?
Learn what key characteristics must be considered as you evaluate the top Hadoop distributions.
4The top Hadoop distributions
Here we provide an in-depth look at each of the six Hadoop distributions analyzed in the final article in this series. We examine the specific components; what platforms these Hadoop distributions are supported on, as well as each vendor's service and support model; and the cost of these subscriptions.
A look at Amazon Elastic MapReduce cloud-based Hadoop
The Amazon Elastic MapReduce Web service offers a managed Hadoop framework that enables users to distribute and process big data across dynamically scalable Amazon EC2 instances. Read Now
Learn more about the Cloudera Hadoop distribution
Cloudera distribution including Apache Hadoop provides an analytics platform and the latest open source technologies to store, process, discover, model and serve large amounts of data. Read Now
Inside the Hortonworks open enterprise Hadoop distribution
The Hortonworks Data Platform consists entirely of projects built through the Apache Software Foundation and provides an open source environment for data collection, processing and analysis. Read Now
Inside the IBM BigInsights platform for big data management
The latest version of IBM BigInsights offers several value-add services that can be used with its core distribution of open source Hadoop for managing big data. Read Now
Inside the Microsoft Azure HDInsight cloud infrastructure
Azure HDInsight is a cloud implementation of Apache Hadoop that provides a software framework designed for processing, analyzing and reporting on big data. Read Now
Inside the MapR Hadoop distribution for managing big data
The MapR Hadoop distribution replaces HDFS with its proprietary file system, MapR-FS, which is designed to provide more efficient management of data, reliability and ease of use. Read Now