columnar database

A columnar database is a database management system (DBMS) that stores data in columns instead of rows. 

The goal of a columnar database is to efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query.  

In a columnar database, all the column 1 values are physically together, followed by all the column 2 values, etc. The data is stored in record order, so the 100th entry for column 1 and the 100th entry for column 2 belong to the same input record. This allows individual data elements, such as customer name for instance, to be accessed in columns as a group, rather than individually row-by-row. 

Here is an example of a simple database table with 4 columns and 3 rows.

ID         Last      First      Bonus

1          Doe      John      8000

2          Smith    Jane      4000

3          Beck     Sam      1000

In a row-oriented database management system, the data would be stored like this:  1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;   

In a column-oriented database management system, the data would be stored like this:  1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000; 

One of the main benefits of a columnar database is that data can be highly compressed. The compression permits columnar operations — like MIN, MAX, SUM, COUNT and AVG— to be performed very rapidly.  Another benefit is that because a column-based DBMSs is self-indexing, it uses less disk space than a relational database management system (RDBMS) containing the same data. 

As the use of in-memory analytics increases, however, the relative benefits of row-oriented vs. column oriented databases may become less important. In-memory analytics is not concerned with efficiently reading and writing data to a hard disk.  Instead, it allows data to be queried in random access memory (RAM).




This was last updated in September 2010

Continue Reading About columnar database

Dig Deeper on Database management system (DBMS) architecture, design and strategy