This article originally appeared on the BeyeNETWORK.
Much like a mountain climber looking down on covered terrain, it is interesting to periodically reflect on what we have done. Unfortunately, we often get so caught up in the daily challenges that we do not see history as it is being made.
With this idea in mind, we can go back and examine the history of the database.
The origins of databases were evident in the first programs that were written, despite the fact that those origins were not obvious. These first programs focused almost exclusively on algorithms and coding languages. During that time, data was simply considered a by-product of function, or processing. Data was basically an afterthought. Additionally, input and output volume was low and there was a greater emphasis on algorithms.
But practitioners better understood the value of data. When early computer languages were applied to business, getting data into and out of the program was essential for processing.
Before long, programs were transformed into applications, which then led to master files. The master file was a compilation of data that was collected, stored and used by other programs. These master files were almost always stored on magnetic tape and accessed sequentially. One shortcoming of master files was that you had to access 100 percent of the data to get at the 5 percent you needed. Another problem was that access was clumsy and inefficient for multi-file merges. Thus, the organization struggled with multi-file merges. But compared to the technology preceding master files, they were an improvement.
Disk storage became common soon after this. Because of disk storage, there was now the ability to access data directly. It was no longer necessary to access the entire file to get at a single unit of data. This led to the idea of a database. Since early theoreticians believed that a database was “a single place where all processing is done,” this definition of a database was sufficient for the time. This definition was confirmed by the wide dispersion of data that was characteristic of master files.
Shortly after disk storage and databases became common, online systems were introduced. Online transactions could be created and processed for the first time. Such online transactions, which were not even possible when data was stored sequentially, allowed for new types of applications. These applications included banking, airline reservation and manufacturing. In short, online databases now gave users an opportunity to use electronic processing. The computer had become a vital part of everyday business. Most daily transactions were now conducted over the computer.
Soon there were online applications available everywhere.
The personal computer also became common at this time. The personal computer did not initially become associated with the large online mainframe computers. In fact, many people did not even recognize that database processing could be done on a personal computer. Compared to the size and sophistication of the database processing used on mainframes, personal computing database processing was done on a very different scale. But database processing was indeed occurring there. Eventually, the database processing available on personal computers (now called the work station) grew in sophistication and capability.
As the personal computer became more prevalent, individual users now felt they should be able to do their own processing. Interestingly enough, this idea arose from the spreadsheet. The independence of processing became accessible with the spreadsheet. This meant the end-user no longer needed IT to tell them what they could or could not do. As a result of spreadsheets, more sophisticated forms of end-user controlled technology emerged. An example of this is fourth generation languages, followed by multi-dimensional analytical tools.
Not long after this, data and data processing was done everywhere in the organization—by the online systems, the customers and the end-users. There was virtually no data integrity. Data permeated the corporation without restraints. Moreover, there was very little data control and believability of information.
Data warehouses arose from this environment. The data warehouse, a fundamentally different type of database, was integrated. This meant that multiple applications had to agree on both meaning and content. The data warehouse contained history, something that online applications disliked because it reduced their speed. The data warehouse was also granular, meaning the data warehouse contained flexible data. Because of the data’s granularity, the data warehouse could support unknown future information needs. Data warehouses effectively became a standard part of the information processing infrastructure. Early database theoreticians hated this development since it directly opposed their original concept of a database, that being the place where all data processing was done. There were now many different kinds of databases—online data bases, data warehouses and analytical databases.
Once data warehousing became common, numerous other database types emerged. One example of this was data marts, which were multi-dimensional databases based on the star schema. Data marts were able to accommodate the needs of a single department or users needing the same things. Another type was the Operational Data Store (ODS) database, where real time analytic processing could be done. There were also exploration databases and data mining databases. This was where data processing was done. Clearly, the data warehouse produced an explosion of new database types.
Today, we continue to search for more databases. In the future, we hope to create databases for unstructured data, archival processing and near line processing.
Bill Inmon is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations. Bill can be reached at 303-681-6772.