Definition

Apache Hive

This definition is part of our Essential Guide: Managing Hadoop projects: What you need to know to succeed
Contributor(s): Delaney Rebernik

Apache Hive is an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for handling large datasets in a distributed computing environment.

Hive has three main functions: data summarization, query and analysis.  It supports queries expressed in a language called HiveQL, which automatically translates SQL-like queries into MapReduce jobs executed on Hadoop. In addition, HiveQL supports custom MapReduce scripts to be plugged into queries. Hive also enables data serialization/deserialization and increases flexibility in schema design by including a system catalog called Hive-Metastore.

According to the Apache Hive wiki, "Hive is not designed for OLTP workloads and does not offer real-time queries or row-level updates. It is best used for batch jobs over large sets of append-only data (like web logs)." 

Hive supports text files (also called flat files), SequenceFiles (flat files consisting of binary key/value pairs) and RCFiles (Record Columnar Files which store columns of a table in a columnar database way.)

This was last updated in November 2012

Continue Reading About Apache Hive

PRO+

Content

Find more PRO+ content and other member only offers, here.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSOA

SearchSQLServer

Close