Redundancy (again!)

As everyday examples prove, redundancy of data, even in databases, is okay.

This article originally appeared on the BeyeNETWORK.

Some bad ideas just won’t die, however much they are proven to be false. The other day, I was listening to a theoretician talk about the way we should build our databases, and he mentioned the evils of redundancy.

Flash back to the world of applications in the 1960s and the 1970s. There were all sorts of applications including accounts payable, accounts receivable, human resources and general ledger – and this is just the short list. With each application came a master file. One day, someone looked up and saw that there was a lot of redundancy between applications. And with this redundancy came a cry – “The sky is falling! Save us from redundancy!”

And it was true – with the redundancy of data came a great confusion as to the value of a particular unit of information. No one knew what the real value – the definitive value – was. So the culprit was redundancy. If we could just get rid of redundancy, we wouldn’t have all of these problems.

That was the Chicken Little line of thinking that permeated the database theoreticians’ thinking. And it was surprising to find this line of thinking still active and being discussed. It was so 1970s.

So what is the problem with redundancy? The truth is that in real life, redundancy is everywhere. It is simply a part of everyday life, and we don’t have a problem with it at all.

Take time for example. People have watches (lots of watches) for telling time. The time is available on the television. There is a phone number that you can call to learn the time. There are clocks on stoves and dashboards. Time is massively redundant, and we don’t have a problem with it. If you think your watch is running fast or slow, you look for a more definitive source and make a correction. There is no question that time is massively redundant.

Take stock quotes for another example. They are everywhere. Stock quotes are printed in the Wall Street Journal and in the business section of your daily newspaper. You can view quotes on your PC and on bank signs. If someone suspects that a stock has been misquoted, the person simply looks to a more definitive source. Is there any problem with redundancy? None.

The simple fact of the matter is that we find redundancy of information everywhere in our day-to-day lives. So what is the problem with redundancy of information in computer systems?

The answer is that we can easily and normally live with redundancy of information in our information systems if we have the notion of architecture and the discipline of the system of record. In other words, we need to ensure that there is integrity of information even if there is redundancy of information. We can’t go updating and creating data anywhere we please. There needs to be a definitive source of information where all creation and all updates occur. Then, data is copied from that location for all purposes. The system of record for information is like the atomic clock for time. The national official atomic clock is the definitive source, and all other sources of time are set against it. Or, the system of record is like the stock trade as it occurs on the floor of the New York Stock Exchange. All other references to the trade come from the actual transaction as it occurred.

So as long as there is integrity of information – as long as there is a system of record – redundancy is okay.

There are many good reasons for redundancy of information. With redundant information, you can minimize resource conflict when a lot of people are trying to get to the same information at the same time. By spreading information redundantly, you can lower the cost of information. You can redundantly spread information across a bunch of inexpensive PCs. With redundancy, you can reach people and places that otherwise would be difficult to reach.

In short, there are a lot of good reasons for redundant information. Those reasons are not going away.

 Bill InmonBill Inmon

Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations. Bill can be reached at 303-681-6772.

Dig Deeper on Content management software for compliance

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.