This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
4. - Brush up on your Hadoop vocabulary: Read more in this section
Explore other sections in this guide:
Apache HBase is a column-oriented key/value data store built to run on top of the Hadoop Distributed File System (HDFS). Hadoop is a framework for handling large datasets in a distributed computing environment.
HBase is designed to support high table-update rates and to scale out horizontally in distributed compute clusters. Its focus on scale enables it to support very large database tables -- for example, ones containing billions of rows and millions of columns. Currently, one of the most prominent uses of HBase is as a structured data handler for Facebook's basic messaging infrastructure.
HBase is known for providing strong data consistency on reads and writes, which distinguishes it from other NoSQL databases. Much like Hadoop, an important aspect of the HBase architecture is the use of master nodes to manage region servers that distribute and process parts of data tables.
HBase is part of a long list of Apache Hadoop add-ons that includes tools such as Hive, Pig and ZooKeeper. Like Hadoop, HBase is typically programmed using Java, not SQL. As an open source project, its development is managed by the Apache Software Foundation. HBase became a top-level Apache project in 2010.