How to select the best DBMS software: A buyer's guide
A collection of articles that takes you from defining technology needs to purchasing options
Apache Cassandra is an open source, distributed key-value NoSQL DBMS. It was originally developed at Facebook and later released as an open source project. Additionally, a free packaged distribution of Apache Cassandra -- DataStax Community Edition -- and a commercial edition of Apache Cassandra are available from DataStax.
Apache Cassandra was created for online applications that require fast performance and no downtime. As a key-value database system, it excels when most, if not all, access is to look up data based on a primary key value. It was built to handle very large amounts of data spread out across commodity servers and to deliver high availability without a single point of failure.
Available for Linux, Windows and Mac OS X operating systems, Apache Cassandra is open source and free to download.
Apache Cassandra features
The latest release of Apache Cassandra is 3.7. Cassandra releases are delivered monthly with even-numbered releases (e.g., 3.2) delivering new features and odd-numbered releases providing bug fixes only. Recent improvements delivered in the version 3 cycle include improved Windows support, a refactored storage engine and support for materialized views.
Downloading and installing the package will set up Apache Cassandra to run on a single node. Although this setup is supported, Apache Cassandra is more commonly run as a multi-node cluster, which requires additional setup steps. The Cassandra configuration files are used to set up clusters, as well as to configure the Apache Cassandra NoSQL DBMS.
DataStax Enterprise, built on Apache Cassandra, offers a multi-model DBMS platform for the enterprise. DataStax Enterprise provides support for key-value, tabular, Java Script Object Notation (JSON)/document and graph data models. The design of DataStax Enterprise Graph is inspired by the open source Titan graph database, but is integrated with Cassandra. Because data from all models is stored in a single persistence layer, each data model inherits all of the benefits of Cassandra as well as the enterprise grade functionality of DataStax Enterprise.
DataStax Enterprise, like Apache Cassandra, is designed for online applications requiring high speed and availability, but augmented with enterprise development and management capabilities.
Apache Cassandra licensing
Subscriptions to DataStax Enterprise are available for both production and non-production environments. Both include certified software and support from DataStax. DataStax Enterprise is free to use in development environments; product use requires the purchase of a license or enrollment in the startup program.
The DataStax Startup Program is available for startup companies with less than $2 million in annual revenue and less than $20 million capital raised. This program offers unlimited, free use of DataStax Enterprise; there's no limit on the number of nodes, and there are no hidden restrictions. Support services available from DataStax include 24/7/365 service-level agreements, certified service packs that ensure that your software is always up-to-date, and hot-fix support for emergency maintenance.
DataStax isn't the only commercial support option for Apache Cassandra, as additional companies provide support.
Apache Cassandra data types
Apache Cassandra NoSQL DBMS supports the most common data types, including ASCII, bigint, BLOB, Boolean, counter, decimal, double, float, int, text, timestamp, UUID, VARCHAR and varint.
Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and built-in caching.
Data access is performed using Cassandra Query Language (CQL), which resembles SQL.
Apache Cassandra benchmarks
Engineers at the University of Toronto conducted a 2012 benchmark on NoSQL database engines. The results conclude that "Cassandra's throughput dominated in all the tests; however, its latency was in all tests peculiarly high."
About the author
Craig S. Mullins is a data management strategist, researcher, consultant and author with more than 30 years of experience in all facets of database systems development. He is president and principal consultant of Mullins Consulting Inc. and publisher/editor of TheDatabaseSite.com. Email him at firstname.lastname@example.org.