Investigating Hadoop distributions: Which is right for you?
A collection of articles that takes you from defining technology needs to purchasing options
The Hortonworks Data Platform enables users to store, process and analyze massive volumes of data from many sources and formats. At its core, the scalable open enterprise Hadoop platform includes Hadoop Distributed File System, a fault-tolerant storage system for processing large amounts of data in a variety of formats and YARN.
YARN (Yet Another Resource Negotiator), a core part of the open source Hadoop project, provides centralized resource management for Hadoop's data processing workload across various processing methods, including interactive SQL, real-time streaming, data science and batch processing. Other enterprise-grade functions supported include data governance, security and common operations support.
With its recent announcement of release 2.4, Hortonworks indicated it will be providing more frequent releases as part of its Extended HDP services. This will provide customers access to interim and more frequent releases and innovations of non-Core Hadoop modules -- e.g., Hive, HBase, Storm and Spark, among others.
HDP Core modules that include Hadoop Distributed File System, YARN and MapReduce will continue to be provided on a single-release-per-year schedule aligned with the Open Data Platform Initiative core Apache-compatible version.
This approach will enable customers who use Hadoop Core modules for critical functions such as data storage to stabilize on less-frequent releases of the more mature core modules. At the same time, this strategy will provide more frequent releases to other customers who are interested in benefiting from those more rapidly evolving Hadoop modules.
HDP 2.4 includes Apache Hadoop 2.7.1 (Core HDP modules) as well as Spark 1.6, HBase 1.1.2, Kafka 0.9.0 and Ambari 2.2.1 as the Extended HDP services.
Hortonworks DataFlow (HDF), which is a separate product, works with HDP and is designed to solve the challenges of automating all types of real-time data flows as well as collecting and curating real-time business insights and actions derived from any data from anywhere. The product is powered by the NiFi Apache open source project that's intended to address the challenges presented by the Internet of Anything (IoAT). Unlike the Internet of Things, which is associated with just sensors and machine data, IoAT includes clickstream data and social stream data.
Hortonworks open enterprise Hadoop offers three installation options:
- Hortonworks Sandbox on virtual machine, a virtualized environment that operates on Mac or Windows in VMware or VirtualBox and provides a personal Apache Hadoop environment intended for prototyping and training purposes.
- Hortonworks Sandbox in the cloud, a cloud-based HDP implementation currently available in Microsoft Azure with a one-month free trial.
- HDP 2.3.2 Ready for the Enterprise, which provides automated installation on Linux and Unix environments using Ambari. Additional features include manual installation using RPM Package Manager for Unix and Linux environments, cloud installation using Cloudbreak for Azure, and Amazon Web Services and OpenStack with Windows installation for Windows Server 2008 and 2012.
Hortonworks Data Platform licensing and support
Aside from optional add-ons and third-party components, Hortonworks Data Platform components are covered under the Apache 2.0 license.
Hortonworks Hadoop offers the following support subscriptions designed to cover the entire lifecycle from proof-of-concept to production deployment and operations:
HDP Jumpstart, which is intended for early-stage data development work. It provides users with a six-month support term for three named contacts during normal business hours. The response commitment time for all severity types is one business day.
HDP Enterprise, which is intended for business-critical operational support. It provides users with a one-year term and supports named contacts based on cluster size. Support is provided 24/7 via phone and Web requests, with a one-hour response time for severity 1 issues, four hours for severity 2 issues, eight hours for severity 3 issues and one business day for severity 4 issues.
HDP Enterprise Plus provides the same level of support as HDP Enterprise, but includes support for these additional modules that aren't included as part of HDP Enterprise support: Accumulo, Atlas, Storm, Ranger, Spark, Kafka and Cloudbreak.
HDP Enterprise Premier Support offers clients designated on-site and personalized support. Premier is available for only clients with existing active enterprise-level support for HDP or HDF.
Contact Hortonworks for pricing information.
What impact will database technology in the cloud have on Hadoop and big data?
Learn which big data use cases and applications a vendor Hadoop distribution can support
Which is right for you: Hadoop clusters or a data warehouse?