Investigating Hadoop distributions: Which is right for you?
A collection of articles that takes you from defining technology needs to purchasing options
CDH, the Cloudera Hadoop distribution, includes several related open source projects, such as Impala and Search. It also provides security and integration with several hardware and software products.
The Impala framework in Cloudera distribution including Apache Hadoop allows users to execute interactive SQL queries directly against data stored in Hadoop Distributed File System (HDFS), Apache HBase or the Amazon Simple Storage Service. Impala uses several technologies and components from Hive, including SQL syntax (Hive SQL), Open Data Base Connectivity driver and Impala's Query UI (Hue is also used by Hive).
As part of CDH, Cloudera Search incorporates Apache Solr, a data indexing and search platform based on Lucene. The integration of this technology as part of CDH provides users with near real-time indexing of and access to data directly stored in Hadoop and HBase. Solr indexing and search technology enables users to perform complex textual searches while requiring little or no SQL or programming skills. Solr also allows for queries to be performed directly against the Hadoop data store, removing the need to move large data sets to perform complex queries.
Other related open source projects included in CDH from Apache are Flume, HBASE, Hive, Hue, Oozie, Spark, Sqoop and Sentry (incubating).
Editions of the Cloudera Hadoop distribution
Cloudera offers several implementation editions of CDH that provide differing levels of cluster and service management capabilities as well as different levels of support:
Cloudera Express is free to use and includes CDH, as well as core features of Cloudera Manager.
Cloudera Manager provides CDH administrators with an intuitive Web-based management console to deploy, manage, monitor and diagnose issues with CDH deployments. The tool also includes an API that can be used to programmatically configure the system and collect metric and health information about a CDH cluster.
Cloudera Enterprise is a licensed edition that provides extended capabilities to CDH with the inclusion of additional advanced features from Cloudera Manager and Navigator. Technical support options are also available to customers that have purchased an enterprise license. Cloudera Enterprise is available in three editions, each offering varying levels of service management capabilities:
- The Basic edition provides management capabilities to support a cluster running core CDH services that include HDFS, Hive, Hue, MapReduce, Oozie, Sqoop, Yet Another Resource Negotiator (YARN) and ZooKeeper.
- The Flex edition supports the management of a cluster running core CDH services plus one of the following: Accumulo, HBase, Impala, Navigator, Solr or Spark.
- The Data Hub edition supports the management of a cluster running core CDH services plus any of the following: Accumulo, HBase, Impala, Navigator, Solr or Spark.
Cloudera Manager Advanced Features add the following to the core product capabilities provided with Cloudera Express: operational reporting, quota management, configuration history and rollbacks, rolling updates and service restarts, direct AD Kerberos integration, Lightweight Directory Access Protocol integration, Simple Network Management Protocol support, support integration with scheduled diagnostics and automated disaster recovery.
Cloudera Navigator, which is available for only Flex and Data Hub Editions, enables users to manage data security and governance for the CDH platform, supporting an organization's compliance and regulatory requirements. The tool can be used to help data managers, analysts and administrators explore the large amounts of data in Hadoop, as well as to more easily manage encryption keys used to secure data residing in the CDH clusters.
Cloudera Hadoop distribution products are supported on Red Hat Enterprise Linux/CentOS 6.6 (in Security Enhanced Linux mode), 6.7 and 7.1 and Oracle Enterprise Linux 7.
Cloudera offers users several options for installing and implementing its products: QuickStartVM provides users with a free to use virtual machine -- VMware, VirtualBox or Kernel-based VM -- running CentOS 6.4 and a single Apache Hadoop cluster along with example data, queries, scripts and Cloudera Manager to manage the cluster. Cloudera QuickStart VMs are intended for demo purposes only.
Cloudera Manager is used for installing and managing Cloudera implementations -- both Express and Enterprise Editions. A license is required to install the Enterprise edition. Installation of Cloudera Express provides users with an optional 60-day trial of Cloudera Enterprise.
Cloudera Director provides self-service users with the ability to deploy and manage Cloudera Enterprise in a variety of cloud environments.
For users interested in manually installing the product, Cloudera provides a version for download that can be run on the operating systems mentioned above.
Cloudera Hadoop distribution licensing, pricing and support
Cloudera Enterprise annual subscriptions vary based on the edition or tier purchased and the number of nodes being run. Contact Cloudera for detailed pricing.
Cloudera offers several support options to organizations that have purchased Enterprise edition licenses. Support isn't available to users of Cloudera Express. Business hour and 24/7 support options are available for all enterprise license holders. Premium support options, which include a 15-minute response time for critical issues, are only available to organizations with the Flex or Data Hub edition licenses.
Cloudera provides training and certification through Cloudera University, which offers both on-demand and private training. Courses and certifications are offered in three tracks for developers, administrators and analysts.
How one company is using Cloudera Hadoop to view customer data
Data lake governance is essential to big data architecture
How do the Hadoop distribution features match your big data needs?