Denormalization is an approach to speeding up read-oriented data retrieval performance in a relational database, where the database administrator selectively adds back specific instances of redundant data after the data structure has been normalized. A denormalized database should not be confused with a database that has never been normalized.
Using normalization in SQL, a database will store different but related types of data in separate logical tables, called relations. When a query combines data from multiple tables into a single result table, it is called a join. The performance of such a join in the face of complex queries is often the occasion for the administrator to explore the denormalization alternative.
Normalization vs. denormalization
Multiple joins in the same query can have a negative impact on performance. As the number of tables spawned by the design increases, the complexity of the joins can create computing overhead that is unacceptable. Joins can be I/O intensive, and related query execution can consume an inordinate amount of time, sometimes even crashing applications.
With database denormalization, data designers stitch together disparate tables. This happens at the expense of a number of data redundancies, but it can be useful for cutting down on the number of joins, and thus speeding operations. The trade-off for the database designer lies in creating database systems that have fewer large tables or more small tables, based on resources and requirements for interactivity of queries. The designer must also weigh this against problems of data redundancy, especially the possibility of increased data inconsistency.
After data has been duplicated, the database designer must take into account how multiple instances of the data will be maintained. One way to denormalize a database is to allow the database management system to store redundant data on disk. This has the added benefit of ensuring the consistency of redundant copies. As disk costs have fallen over the years, such approaches have grown in popularity. Makers of relational databases have enhanced databases to ensure better consistency when denormalization is employed.
Logical data design
One of the approaches to denormalization is to denormalize the actual logical data design. This can quickly lead to inconsistent data. Rules called constraints can be used to specify how redundant copies of information are synchronized, but they increase the complexity of the database design and also run the risk of impacting write performance.
Common cases where denormalization may effectively be used include pre-joined tables that package data for common uses, split tables tailored for specific user groups and combined tables that consolidate one-to-one and one-to-many relationships.
Denormalization in data warehousing and NoSQL
In relational data warehouses, denormalization often plays an important role. This is particularly true in dimensional databases as prescribed by influential data warehouse architect and author Ralph Kimball. His emphasis on dimensional structures that use denormalization is intended to speed query execution, which can be especially important in data warehouses used for business intelligence. Denormalization helps data warehouse administrators ensure more predictable performances.
Examples of denormalization go beyond denormalization in SQL. A type of denormalization also has use outside of relational databases. The technique is often employed in applications based on NoSQL databases, particularly document-oriented NoSQL databases. Such databases often underlie content management architectures for web profile pages that benefit from read-optimizations. The goal of denormalization in this context is to reduce the amount of time needed to assemble pages comprising disparate profile data. In such cases, maintaining data consistency becomes the job of the application, and, in turn, the application developer.