A columnar database is a database management system (DBMS) that stores data in columns instead of rows.
The goal of a columnar database is to efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query.
In a columnar database, all the column 1 values are physically together, followed by all the column 2 values, etc. The data is stored in record order, so the 100th entry for column 1 and the 100th entry for column 2 belong to the same input record. This allows individual data elements, such as customer name for instance, to be accessed in columns as a group, rather than individually row-by-row.
Here is an example of a simple database table with 4 columns and 3 rows.
ID Last First Bonus
1 Doe John 8000
2 Smith Jane 4000
3 Beck Sam 1000
In a row-oriented database management system, the data would be stored like this: 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;
In a column-oriented database management system, the data would be stored like this: 1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;
One of the main benefits of a columnar database is that data can be highly compressed. The compression permits columnar operations — like MIN, MAX, SUM, COUNT and AVG— to be performed very rapidly. Another benefit is that because a column-based DBMSs is self-indexing, it uses less disk space than a relational database management system (RDBMS) containing the same data.
As the use of in-memory analytics increases, however, the relative benefits of row-oriented vs. column oriented databases may become less important. In-memory analytics is not concerned with efficiently reading and writing data to a hard disk. Instead, it allows data to be queried in random access memory (RAM).