This content is part of the Buyer's Guide: Investigating Hadoop distributions: Which is right for you?

Inside the Microsoft Azure HDInsight cloud infrastructure

Azure HDInsight is a cloud implementation of Apache Hadoop that provides a software framework designed for processing, analyzing and reporting on big data.

Microsoft Azure HDInsight is designed to help users quickly and cost-effectively deploy and use Hadoop and other Apache big data analysis and processing products. To support the service, Microsoft leverages its managed Azure cloud infrastructure, enabling users to provision a Hadoop cluster without having to purchase, install and configure the necessary hardware and software. Azure HDInsight also lets users resize the environment on demand by spinning up additional nodes to handle their computing capacity needs -- from terabytes to petabytes.

The service uses the Hortonworks Data Platform (HDP) Hadoop distribution and includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie and Ambari, as well as other Apache products. Additional components can be installed as part of provisioning a cluster by executing scripts. Several scripts are provided by HDInsight to install and configure Hue, Giraph, R and Solr, enabling users to create scripts of their own to install other Apache components. HDInsight also integrates with business intelligence tools, including Power BI, Excel, SQL Server Analysis Services and SQL Server Reporting Services.

Functions of Microsoft Azure HDInsight

Users can quickly set up an HDI cluster through the Azure portal to specify the cluster type, Hadoop, HBASE, Storm or Spark; the operating system of the cluster, Linux or Windows; and the HDI version to use (as described below). The cluster type the user selects drives the number of base nodes and each of their roles in that cluster type. The HDInsight pricing page provides the detailed layout for each of these cluster types.

HDInsight supports multiple Hadoop cluster versions at any time, with each tied to a specific version of the HDP. The current default, HDInsight version 3.2, is based on HDP version 2.2. This version includes Apache Hadoop & YARN (2.6.0), Apache Tez (0.5.2), Apache Pig (0.14), Apache Hive and HCatalog (0.14.0), Apache HBase (0.98.4), Apache Sqoop (1.4.5), Apache Oozie (4.1.0), Apache Zookeeper (3.4.6), Apache Storm (0.9.3), Apache Mahout (0.9.0), Apache Phoenix (4.2.0) and Apache Spark (1.3.1), as well as other Apache products.

Six additional versions of HDInsight are currently supported: HDI 1.6, HDI 2.1, HDI 3.0, HDI 3.1, HDI 3.3 and HDI 3.4. Each is based on different versions of HDP, respectively: HDP 1.1, HDP 1.3, HDP 2.0, HDP 2.1, HDP 2.3 and HDP 2.4. And each includes different versions of Hadoop and other Apache big data products.

Through the use of the Azure SDK for .NET, developers can integrate their Visual Studio Integrated Development Environment with their HDI cluster by installing the HDInsight Tools for Visual Studio and the Microsoft Hive Open Database Connectivity driver. The SDK allows developers to connect and navigate HDI Insight Hive databases and linked storage accounts for HDInsight clusters, create tables, and create and run Hive queries.

Pricing and support for Azure HDInsight

Customers are billed from the time a cluster is created to when it's deleted. They can estimate their cost based on the Azure features they need using the pricing calculator. Different cluster types -- Hadoop, HBASE, STORM, Spark -- have different minimal node configurations. Pricing is based on hourly charges per node and node instance type -- compute power and memory. There are additional fees for storage and data transfer. 

Microsoft offers a 30-day free trial and a $200 credit to use in Azure. The trial account is decommissioned once the 30-day trial has expired if the user hasn't upgraded to a pay-as-you-go Azure subscription.

Microsoft Azure offers several support subscriptions, including technical support for Hadoop as well as other Azure services. The Hadoop support is backed by Hortonworks, the distributors of the Hadoop distribution deployed in Azure HDInsight.

Next Steps

Watch out for these Hadoop performance bottlenecks

What you need to know before you evaluate Hadoop distributions

A major milestone for Hadoop

Dig Deeper on Hadoop framework