What is data redundancy?

The truth is redundancy of data is absolutely a normal part of life.

This article originally appeared on the BeyeNETWORK.

Redundancy of data is an inefficient and costly waste of resources. IT technicians, if given the opportunity to use their skills and intelligence, can eliminate or at least minimize this redundancy. 

As long as there were one or two applications, redundancy of data was never much of an issue. But those applications continued to grow. The ones that already existed continued to expand. Then there were many applications and significant overlap between them. The same data appeared in different places, but it had different values. Furthermore, reconciliation of values was impossible. There was not a single place that an organization could reply upon to determine the correct value. 

Another problem was that this redundancy of data required maintenance. When it came time to make a change, the change had to be applied in multiple places. Under these circumstances redundancy of data got a "bad name". But was this justified? Was redundancy of data the real culprit here? According to the collective intelligence of technicians, if we got rid of redundancy, we would get rid of many problems associated with it. 

The truth is -- redundancy of data is absolutely a normal part of life. Take time, for example. Time is everywhere. It is on your wristwatch. It is on the television. It is on the clock in your house. It is on the radio. It is in your room at the hotel. In short, almost everywhere you look you find time. If ever there was a piece of information that was redundant and ubiquitous, it would be the time. 

And do we have a problem with redundancy of time? Not at all. If we find a clock that has the wrong time we simply reset it. In fact, twice a year we reset our clocks due to daylight savings time. But if it were up to the technician, there would be no redundancy. If it were up to the technician there would be one and only one clock in the world and everyone would have to do their business based on that one clock, even though only a few people could ever see the time. This is of course is an absurd proposition.

Clearly redundancy of data is a good thing

So what is the problem with all of these applications and the massive amount of data redundancy that we have? The problem is that there is no single "system of record". A system of record is the designation of the one place where data will be captured and updated. From that single place, data will be copied.  The early application designers didn't understand architecture. Everyone captured data, edited and updated it. As a result no one knew what the real values should be. It wasn't the data that was bad. It was the architectural treatment of the data that was to blame.  

There it is! It is the architecture that was created to support the early applications that is the problem. Redundancy of data is merely a byproduct of an improper architecture. The reality is that massive amounts of redundant data should reside within the corporation.  As long as there is a proper architecture supporting that redundant data, there is nothing that is wrong or out of place.

About the author:
Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations. Bill can be reached at 303-681-6772.

More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Dig Deeper on Enterprise data architecture best practices