How to select the best DBMS software: A buyer's guide
A collection of articles that takes you from defining technology needs to purchasing options
Although relational database management system (RDBMS) products continue to rule the roost in terms of installations and usage, NoSQL database technology represents the fastest-growing type of DBMS being adopted today.
NoSQL describes a broad category of database systems that, in some cases, may have dramatically different capabilities and use cases. Coined in the late 1990s, the term originally meant, quite literally, no SQL. Over time, as reality set in and the practicality of exposing data to a SQL interface became apparent, NoSQL morphed in meaning to NO SQL, where the NO stands for Not Only. Today, NoSQL databases are considered next generation databases, as they're generally non-relational, distributed, open source and horizontally scalable.
The four main types of NoSQL DBMSes and their uses
A key/value database is ideal when data is accessed by using a key, such as looking up a book by its international standard book number (ISBN). Here, the ISBN is the key and the value is the rest of the information about the book. The key must be known and can therefore be queried, but the value is a blob of meaningless data that must be interpreted after it has been read.
A column DBMS -- also known as a column store or wide column store -- stores data as families of columns that define a record. With traditional relational databases, data is modeled as rows of columns with access always by row. The NoSQL wide column store manages records in column families capable of holding a large number of dynamic columns. A row in a column store can span a large number of columns that might require many relational tables. There is no fixed schema, meaning column names and keys can vary. A column database is well-suited for data where writes are uncommon; atomicity, consistency, isolation and durability (ACID) isn't a hard requirement; and the schema is variable.
A graph database focuses on relationships between values, and stores data using the mathematical concept of a graph. Graph databases use graph structures with nodes, edges and properties to represent and store data. In a graph database, every element contains a direct pointer to its adjacent element, and no index lookups are necessary.
NoSQL DBMSes are becoming popular for web-scale, big data and analytical implementations. Each category of NoSQL DBMS is practical for different types of applications and uses, thereby requiring the introduction of a term that''s frequently used within the NoSQL community: polyglot persistence, or using different database systems for different applications and use cases based on how the database handles the needs of the application. Therefore, the NoSQL mantra is to use the right DBMS for the specific need, even if it means introducing a new database system.
Pros and Cons of NoSQL
So why would you consider a NoSQL DBMS instead of a relational DBMS? Perhaps the greatest strength of a NoSQL DBMS is its decentralized, scalable, fault-tolerant nature. Most NoSQL database technology is implemented to scale and survive outages.
With NoSQL databases you can customize your data management solution for each specific use case. Whereas relational databases are engineered to be widely practical across various use cases, each NoSQL DBMS is designed for specific uses. By embracing polyglot persistence, an organization can choose the database technology that best suits each particular use.
Additionally, most NoSQL products are lighter weight and therefore require less overhead than a relational solution. Since NoSQL products are designed for specific use cases and problem sets, there''s less functionality than with most relational DBMSes, which are designed for a broader set of uses. So a NoSQL DBMS will require less code, which potentially confers a performance benefit over a more complex DBMS.
Of course, NoSQL has its disadvantages, too. Consider ACID support, which is standard for a relational DBMS but is lacking in many (though not all) NoSQL DBMSes. If ACID support is crucial, you must investigate whether or not the NoSQL database you''re targeting offers ACID.
Another downside of using NoSQL databases is the lack of SQL support. In its 40-plus year life span, SQL has grown into the lingua franca of data access. A DBMS that doesn''t support SQL requires developers to learn different programming techniques to access its data. As with ACID support, some NoSQL databases have added SQL capabilities, though usually not as full-featured as in relational databases. In other words, don''t expect to take a relational SQL query and run it on a NoSQL database without making significant changes.
The NoSQL market today is also very confusing. There are literally hundreds of different NoSQL databases from which to choose. And there''s no formal data model like there is with relational (set theory), so each NoSQL DBMS can, and usually will, be very different from any other -- even sometimes within the same type of offering. This confusing mish-mash of approaches can make it difficult to succeed with a NoSQL approach unless a significant amount of investigation and due diligence are applied.
NoSQL DBMS use cases
Given that the raison d'être for NoSQL is to apply data persistence techniques that are well-suited for specific use cases, let''s examine when it makes sense to use each type of NoSQL database:
Key/value. The key/value database is designed to be good for the high-availability, low-latency requirements of applications such as gaming, retail and mobile. The schema flexibility of key/value databases helps them excel at session management, serving ad content and managing user or product profiles. In other words, when data is encoded in many different ways without a rigorous schema, using a key/value database makes sense.
Key/value databases aren''t particularly good for managing complex relationships between different sets of data or for querying using anything other than the defined key.
Document database. This type of database excels at enabling you to store different kinds of data for each document with the capability to flexibly search across all the data. Document stores can be a good choice when your schema isn''t rigid but you still require the ability to query by something other than just a single key. Document databases work well for event logging, online shopping, content management and in-depth analytical processing. The schema flexibility of document databases can also be useful for projects requiring rapid prototyping.
Unfortunately, document databases aren''t particularly well-suited for complex transaction processing. For applications that require data aggregation, a document database isn''t a good idea because the flexible schema means the data isn''t consistent across all the documents and is unlikely to be usefully aggregated.
Column store. This type of database stores data in column families as rows. There are many columns associated with a key (or identifier). As with the other types of NoSQL DBMSes, the schema is flexible; a column family can be composed of different columns for each row. Additionally, data in a column store can be accessed by columns other than the key.
The concept of a column database isn't new, with variants of the idea implemented as a relational database in the past (e.g., SAP Sybase IQ and IBM DB2 BLU). But the relational column database, differs from the NoSQL column store. The relational column database was designed for sparse data and analytical processing whereas the NoSQL column store was designed for varying schema consisting of a large number of columns.
Column stores are efficient for systems where writes are rare and you frequently need to read many columns of a record at once. Column stores work well for event logging, content management, and counting and/or categorizing for analytics. Column stores are also useful when you have expiring data because it''s possible to set up a column to automatically expire.
You may wish to avoid column stores for systems with wildly varying queries because you may have to redesign the column families. Aggregating data across rows is not efficient with column stores, and column stores aren't well-suited for ACID transactions.
Graph database. This type of database is probably the most different from the traditional concept of a database system than any other type of NoSQL DBMS. Graph databases are specifically designed for situations where data elements are interconnected and there's an undetermined number of relationships between them. The most common use case for graph databases is to implement a social media network, such as LinkedIn or Facebook. Of course, there are other applications such as delivery routing and dispatching, location-aware systems, public transportation links, road maps, curriculum prerequisites and network topologies. Another practical application for graph databases is to support a recommendation engine, such as those used by online retail sites.
Graph databases aren't particularly well-suited for frequently changing data and real-time updates across large amounts of data. Additionally, if you plan to partition the database across a network, graph databases will likely experience performance degradation.
Six additional NoSQL DBMS factors to consider
The many differences within the NoSQL landscape impose a degree of difficulty when evaluating NoSQL technology. There are different architectures for different database types and products. There are even differences within the types of NoSQL DBMS, and there are no standards, including no standard way of accessing data (as opposed to the relational market). This means there are specific tools and application programming interfaces for accessing data in each DBMS that must be adopted and learned to use a NoSQL database. Here are some additional considerations:
Rapid change. The realm of NoSQL is dynamic, with improved features, added functionality and even new products being introduced. It can be difficult to keep up with the latest and greatest capabilities as you evaluate NoSQL database systems.
Growing support for ACID capabilities. One of the early selling points for NoSQL was that it supported transactions that didn't require full ACID support, thereby slimming down the DBMS and improving performance. Instead of ACID, NoSQL promoted Basically Available Soft-state services with Eventual-consistency (BASE). Nevertheless, many applications rely on ACID and the ability to support ACID transactions is a growing and desirable capability of a NoSQL DBMS.
Lack of multi-platform support. Most NoSQL DBMSes were borne of the open source movement, and therefore usually run on Linux (or perhaps a Unix variant). If you need to implement your DBMS on Windows or the mainframe, you will need to review the commercial products, which more commonly permit deployment on multiple platforms.
Increasing support for SQL. Without SQL, querying is usually very basic and may require complex coding in a high-level language. Of course, this differs from product to product, but it's a good idea to look for SQL support in a NoSQL DBMS because there are many developers who have SQL coding knowledge. You can also add SQL support for NoSQL databases using tools such as Apache Pig and Apache Hive.
Ability to exploit multiple types of databases. There are some NoSQL DBMSes that allow you to model and implement data using flexible combinations of key-value pairs, documents and graphs. Additionally, relational DBMSes are beginning to adopt NoSQL capabilities. Using a DBMS that can exploit multiple types of database storage can make it easier for your organization to adopt polyglot persistence.
Beware of pre-V1 technology. Many open source projects run for years without a version 1 option. The software may be robust and useful, but risk-averse organizations typically avoid running pre-V1 technology in their production environment.
Understanding NoSQL databases
Although NoSQL database technology enjoys a lot of market awareness today, the landscape is still crowded, confusing and rapidly changing. Understanding NoSQL databases requires digging into multiple types of database engines with varying use cases. Choosing the wrong NoSQL technology for a project can cause the project to fail.
The NoSQL DBMS, however, is being adopted successfully in many projects today, and the underlying technology offers advantages when deployed in the correct manner on appropriate projects.
About the author:
Craig S. Mullins is a data management strategist, researcher, consultant and author with more than 30 years of experience in all facets of database systems development. He is president and principal consultant of Mullins Consulting Inc. and publisher/editor of TheDatabaseSite.com. Email him at email@example.com.
This article was updated in September 2016.
Learn more about what works best for specific applications when evaluating NoSQL software.