Definition

Google Bigtable

Robert Sheldon

By

Robert Sheldon

What is Google Bigtable?

Google Bigtable is a distributed NoSQL database service created by Google to handle large amounts of structured, semistructured and unstructured data. Although Bigtable is available as a public subscription service, the platform also supports many of Google's own core services, including Google Search, Google Maps, Google Drive, Google Analytics and YouTube.

Bigtable was designed to support analytical and operational workloads requiring massive scalability. The platform can scale to petabytes of data, with workloads spread across thousands of commodity servers. The platform can also deliver single-digit millisecond latency and promises up to 99.999% high availability. Bigtable is based on a sparsely populated table design that can accommodate thousands of columns and billions of rows.

Bigtable is capable of processing more than 6 billion requests per second. Customers can seamlessly scale their operations from thousands to millions of read/write operations per second by adding or removing cluster nodes, and they can make these changes without incurring downtime. They can also automatically scale their clusters up or down to meet fluctuating workload demands. In addition, Bigtable supports automatic replication with eventual consistency.

According to Google, the Bigtable service can accommodate multiple types of data, making it possible to support a variety of workloads. This includes financial data, marketing data, graph data, time-series data and IoT data. The platform is well-suited to batch MapReduce operations, machine learning applications, and stream processing and analytics. The Bigtable service currently manages over 10 exabytes of data.

Graphic explaining the four primary types of NoSQL databases. — NoSQL databases like Google's Bigtable service are geared toward managing large sets of varied data and frequently updated data.

How is data stored in Bigtable?

Bigtable stores data in scalable tables made up of columns and rows. Each table provides a sorted key/value map, with each row indexed by a single row key. Related columns are often grouped into column families and a unique name is assigned to each column family. The tables are also sparse; if a column does not contain data for a particular row, the cell does not use any storage space.

Within these tables, each row/column intersection can contain one or more cells. In a traditional relationship database table, each row/column intersection can contain only one cell. Every cell in a Bigtable table contains a unique timestamped version of the data. This approach enables Bigtable to maintain a record of how data has changed over time.

The tables in a Bigtable database are sharded into blocks of contiguous rows that are referred to as tablets. Tablets are flexible structures that make it easier to balance workloads across server nodes. The tablets are stored in the SSTable format on Google's Colossus file system. An SSTable is a file that contains sorted key-value pairs. The file provides a key-to-value mapping that is ordered, persistent and immutable. Both the keys and values are arbitrary byte strings.

Chart showing key differences between structed and unstructured data. — A distributed NoSQL database service, Google Bigtable can handle large amounts of structured, semistructured and unstructured data.

The Bigtable service is organized into a hierarchy of components made up of instances, clusters and nodes. These components provide customers with an overall structure for working with their databases:

Instance. The instance sits at the top of the hierarchy. It offers a logical structure for deploying a Bigtable database, while providing a container for the customer's data. An instance requires more than one cluster to support replication. The storage type (SSD or HDD) is determined at the instance level.
Cluster. An instance contains one or more clusters. Each cluster is located in a specific zone. Google organizes its data services into regions, which contain individual zones. A single Bigtable instance can contain clusters in up to eight regions; however, each zone can include only one cluster.
Node. A cluster contains one or more nodes. The nodes provide the compute resources necessary to drive the Bigtable operations. A tablet is associated with a specific node at any given time. The node tracks the tablets that are assigned to that node. A node never stores the data, only points to it. In this way, Bigtable can quickly rebalance nodes by updating the pointers. It can also seamlessly recover from node failure without losing data. In addition, the nodes handle incoming read/write requests for the tablets and perform maintenance tasks on them, such as compacting the data.

All client requests to the Bigtable service go through a front-end server pool, which forwards the requests to the individual Bigtable nodes. The nodes then communicate with their respective tablets. By adding nodes to a cluster, customers can increase the number of simultaneous requests that their clusters can handle, while also increasing the cluster's maximum throughput.

Google has maintained Bigtable as a proprietary, in-house technology. Nevertheless, Bigtable has had a large impact on NoSQL database design. In 2006, Google software developers publicly disclosed Bigtable details in a technical paper presented at the USENIX Symposium on Operating Systems and Design Implementation.

The paper's thorough description of Bigtable's inner workings allowed other organizations and open source development teams to create database systems that are modeled after Bigtable. Those systems include Apache HBase database, which runs on top of the Hadoop Distributed File System (HDFS); Cassandra, which originated at Facebook; and Hypertable, an open source project that ended development in 2016.

Explore 7 Google Cloud database options to free up your IT team. See also: columnar database and check out 18 top big data tools and technologies to know about.

This was last updated in January 2024

Continue Reading About Google Bigtable

A cloud services cheat sheet for AWS, Azure and Google Cloud

DBMS keys: Types of keys defined

Break down Google big data services

Hadoop vs. Spark: An in-depth big data framework comparison

Explore Hadoop distributions to manage big data

Dig Deeper on Database management

Business Analytics

AI-fueled efficiency a focus for SAS analytics platform
The vendor's latest product development plans include an AI assistant and prebuilt AI models that enable workers to be more ...
Customer segmentation analytics evolve with GenAI, ML
GenAI, machine learning and advanced analytics techniques automate time-consuming aspects of customer segmentation, freeing up ...
Google Cloud to inject Gemini into data, analytics tools
The tech giant unveiled integrations between its LLM and BigQuery, Looker and its databases to provide customers with a ...

AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...
Compare EKS vs. self-managed Kubernetes on AWS
AWS users face a choice when deploying Kubernetes: run it themselves on EC2 or let Amazon do the heavy lifting with EKS. See ...

Content Management

5 benefits of enterprise search
With a proper enterprise search strategy in place, organizations can improve their employees' efficiency and ensure customers ...
OpenText expands GenAI for enterprise content, IoT
OpenText finds a novel use for generative AI: combing through, sorting and summarizing massive amounts of IoT data. It also ...
Traditional CMS vs. headless CMS: What's the difference?
Traditional CMSes let users design websites, yet they lack the flexibility of headless systems. Differences between these tools ...

Oracle sets lofty national EHR goal with Cerner acquisition
With its Cerner acquisition, Oracle sets its sights on creating a national, anonymized patient database -- a road filled with ...
With Cerner, Oracle Cloud Infrastructure gets a boost
Oracle plans to acquire Cerner in a deal valued at about $30B. The second-largest EHR vendor in the U.S. could inject new life ...
Supreme Court sides with Google in Oracle API copyright suit
The Supreme Court ruled 6-2 that Java APIs used in Android phones are not subject to American copyright law, ending a ...

SAP chief AI officer: Waiting on AI is the wrong strategy
SAP's first chief AI officer, Philipp Herzig, outlines the company's new AI-focused organization and underscores why companies ...
SAP, Nvidia partner to boost Business AI development
SAP and Nvidia are working together to combine platforms and services that help customers build business-specific generative AI ...
SAP Datasphere adds data governance, GenAI for analytics
SAP introduced new functionality in SAP Datasphere to help customers better manage their data environments with governance, ...

Close