Over the past few decades, most transaction processing and data warehousing systems have been built on relational...
database management systems. Now, however, many organizations are scaling up their analytical capabilities in ways that relational software can't support -- at least, not optimally.
For example, companies are increasingly looking to incorporate semi-structured and unstructured data into their analytics programs; that's hard to do with relational databases, which are primarily geared to structured transaction data. Additionally, data scientists and other analysts want to run analytical algorithms against all that data, something that often requires a high-powered big data platform.
That's where NoSQL databases enter the picture. They provide alternative means of managing, accessing and querying data compared to relational databases. They also multiply the database choices available to IT teams: Different types of NoSQL data stores are available for different kinds of applications, creating newfound flexibility in analytics architectures.
NoSQL technologies let organizations get around two aspects of relational databases that constrain how data can be used. First, the data in a relational database management system (RDBMS) is stored in a tabular structure, with each table consisting of records organized in rows with a static number of columns producing fields populated with data values. Second, the data is organized according to a predefined schema that's difficult to adjust after the system has been put into production.
The strict structures and schemas of relational databases are a good match for transaction processing applications, but they become limitations for many advanced analytics uses. As a result, RDBMS platforms often aren't suited to meeting the evolving analytics needs of organizations.
Less rigidity on data modeling rules
Data modeling for most NoSQL systems is more relaxed. They use an approach referred to variously as schema-less or schema-on-read modeling, in which the structure of a schema is embedded within each data element as it's read. That allows for a more flexible data representation that isn't constrained by a static data structure, simplifying management of large and diverse data sets and expanding potential downstream uses of the data in different analytics applications.
NoSQL environments aren't uniform, either -- multiple groups of disparate technologies are collectively referred to as NoSQL products because of their nonrelational nature. There are four primary categories of NoSQL data stores.
Key-value stores. In these databases, data objects are associated with distinct character strings called keys, similar to the data structure known as an associative array. The key-value pair is a relatively simple concept for NoSQL architectures whereby unique keys are used to index entities with attributed values. Importantly, because a key-value store doesn't impose any constraints on the type or format of data elements, applications are free to interpret the data semantics on the fly.
Document databases. As in key-value stores, data objects in a document database are associated with and accessed using character-string keys. However, the items are stored in a document-like format and have some structure. They support different standard encodings, including extensible markup language; JSON; and BSON, which is a binary encoding of JSON objects. Document databases also embed metadata about data elements, helping to simplify the process of querying data based on its content.
Wide-column or table stores. The wide-column store model is descended from the design of Google's Bigtable database, which was detailed in a 2006 research paper and is still used by Google to run its search engine and other core applications. Wide-column stores spread data across tables with numerous columns to help speed up the querying of very large data volumes. Columns are based on key-value pairs and don't need to be identical in each row of a table, providing added flexibility for storing diverse data sets.
Graph databases. These are something of a different animal than the previous three NoSQL database types. Graph databases map out relationships between data entities in graph form, organizing them as networks of interconnected nodes with labels that describe how the nodes are related to one another. The graphs can continue to grow in both size and complexity as more data is collected and the number of nodes and connections expands.
Even more NoSQL data stores
Other types of data stores are also classified as NoSQL software -- for example, object databases that provide a hybrid approach designed to bridge schema-less data management and traditional relational data models. NoSQL vendors are increasingly merging the different technology styles into unified product offerings, creating multimodel databases that can support a combination of data models and applications. That potentially lets IT teams get the flexibility of NoSQL architectures without having to deploy multiple database management systems.
Overall, NoSQL data stores provide increased elasticity on the use of computing, storage and network bandwidth. Unlike relational databases, they don't force data to be persistently stored in particular formats or physical locations. Many also support integrated data caching that helps reduce data latency and speeds up processing performance in a system configured with enough memory.
As a result, NoSQL systems can deliver fast throughput for both ingesting and querying data as well as enhanced scalability to support the collection and management of massive amounts of data.
More on the different NoSQL database types and choices
Data modeling for NoSQL systems requires new thinking
Take our quiz on NoSQL databases