Life -- and database architecture design -- used to be so simple when I started working as a database administrator...
in the early 1980s. There was just one database management system (DBMS) that really mattered: the IMS hierarchical database from IBM, with a rival called IDMS also in the picture.
In both cases, you found database records via a key and navigated from one record to another, or scanned them in sequence. But there was a problem: IMS, which had been around since 1968, lacked the ability to easily search records.
That led to the development of the relational database, based on the idea of grouping related records into tables and using a language -- SQL -- that didn't specify how to access the records; instead it left that task to the DBMS itself.
First commercialized in 1979, relational databases were adopted more and more widely as the 80s unfolded. Those years seemed to effectively end any argument about database styles. Oracle, Microsoft SQL Server, IBM DB2, Ingres and Informix -- all relational databases -- slugged it out for commercial dominance, with the first three eventually emerging as the undisputed market leaders.
Sure, there were a few specialist database holdouts, like Adabas, but only academics and a few oddballs were really arguing about what a database should look like. The early 1990s did see some brief excitement around object databases; they never took off beyond a tiny niche, though, and relational databases continued to rule the DBMS planet.
For technology planners in large enterprises, too, things were easy: You just picked one of the big vendors and standardized on its database.
A changing database picture
What a difference another couple decades makes. A range of new database technologies now poses a threat to the cozy oligopoly enjoyed by Oracle, Microsoft and IBM. There are several reasons why the picture changed, along with database architecture design practices in many organizations.
First, the top vendors optimized their DBMS software for transaction processing and were slow to adapt to the growing need for read-only reporting and analytics, in which large numbers of records have to be accessed in ways that are hard to predict. Teradata attacked that problem with its data warehouse database, but that was still relational technology.
Another approach was to reshape the database itself. Instead of storing data in the rows of tables that were placed in adjacent blocks on disks, developers created columnar databases that stored it in columns. This dramatically sped up read-only access for analytical processing. MonetDB, Vertica and Greenplum are examples of the columnar approach.
Things became more complex as companies started looking to use databases to store things other than numbers. Relational databases were never good at storing text, audio files or videos, and the huge increase in internet and big data applications, accompanied by an explosion of data volumes, shone a bright light on that limitation and the need for more flexible database platforms.
The last 10 years have seen the rise of new database alternatives to address those issues -- in particular, NoSQL systems, which don't require a fixed database schema and tend to be highly scalable. To meet different application needs, there are multiple categories of NoSQL software, including wide-column stores, document databases, key-value stores and graph databases -- as is shown in the chart below. The ranks of NoSQL databases include Cassandra, MongoDB, HBase, Couchbase, Neo4j and seemingly countless others, many of them open source.
To add to the mix, databases are increasingly optimized to use as much memory as possible rather than disk storage to improve performance -- SAP's HANA is one example. There are also NewSQL databases that try to reconcile the advantages of the NoSQL approach with the transactional consistency of relational software, as well as multimodel databases that support different types of data models.
More options, more complexity
The life of the enterprise architect has become considerably more complex as a result of this technology profusion. The DB-Engines website lists more than 300 databases in the popularity rankings it calculates; there are more than two-dozen graph databases alone, and even more key-value and document stores.
The old strategy of just choosing one of the mega-vendor relational databases simply doesn't suffice anymore. Instead, a modern enterprise architect has to plan for a world where relational, NoSQL and analytical databases coexist with one another, and possibly with assorted other DBMS exotica, as well as the Hadoop-based data lakes that are washing over the shores of the enterprise IT landscape.
As part of the database architecture design process, enterprise architects need to weigh taking advantage of the features available in different products versus reaping the economic benefits of technology standardization: skills reuse, efficient procurement, etc. Realistically, with IT purchasing often decentralized, the best that most architects may be able to do is to classify the main use cases in their organizations and recommend database software that suits them.
But that's at least better than allowing a complete free-for-all. With so many options, a database architecture plan needs to guide developers to make sensible choices for their data processing workloads. It's a fast-changing environment, and in the absence of sufficient guidance, chaos may ensue.
"May you live in interesting times" is apparently not a real Chinese curse, but the times are certainly interesting for those responsible for database architecture design. Chairman Mao's "Let a hundred flowers blossom" slogan, a real one from the 1950s, seems to now also apply to database platforms -- as both a blessing and, if you aren't careful, a potential curse.