This article originally appeared on the BeyeNETWORK.
In a previous article, the subject of building an imperfect data warehouse was addressed. An imperfect data warehouse is what is necessary in a large organization that is beset with paralysis borne of its own massive infrastructure. While an enterprise data warehouse is optimal, it may not be possible. Strictly from a practical standpoint, an imperfect data warehouse that represents less than the full enterprise may be the only realistic option for some organizations.
So, upon the occasion of building an imperfect data warehouse, what are the conditions – the issues – that the designer needs to be alerted to? The following conditions should be – at least – examined, and the design team should be aware of them:
Overlap. Overlap between the data warehouse and other corporate systems should be avoided wherever possible. If, however, overlap is inevitable, it should be minimized wherever and however possible. If you are going to build a data warehouse that is a subset of the enterprise, you should strive for building into your data warehouse the minimum amount of overlap with other data that exists in the corporation. Admittedly, this may not be easy; however, it should at least be a goal.
Use commonly accepted key structures. Attempt to make use of as many “standard” key structures as you can. Unless there is no other way, do not invent a new key structure out of the blue.
Use commonly accepted encoding values. Do not invent a brand new encoding structure for data in your data warehouse unless there is absolutely no other alternative. Occasionally, an organization will have multiple encoding structures for the same data. Choose an existing encoding structure that appears to fit the needs of the most people.
Use standard metadata terms, wherever possible. Such terms typically describe tables and attributes, and other forms of metadata. If there happens to be a dominant set of metadata terms already in place, use it. Unfortunately, that often is not the case. Nevertheless, there should be an attempt to use as much of what already exists in the way of metadata as is plausible.
Keep the data in the data warehouse at the lowest level of usefulness. Do not compromise on the level of granularity. If no one else in the organization has data at the lowest level of usefulness, then create the data warehouse at the lowest level. This issue is not able to be compromised. Stated differently, if this issue is compromised, there will be a heavy price to pay later.
Be ready to supply default values for data inside the data warehouse where no source values exist.
Make sure your data warehouse design is based on normalization. The more normalized your data warehouse is, the easier it will be to add to it and tear it apart at a later point in time. The less normalized your data warehouse design is, the more difficult it will be to add onto it at a later point in time. This is not the time to introduce star joins and fact tables into a data warehouse design.
Do not build your data warehouse using a big bang methodology. Instead, build it a single iteration at a time. (This important rule is true regardless of whether you are building a “perfect” or an “imperfect” data warehouse.)
Avoid redundancy within your data warehouse. It is bad enough that you will have redundancy with other data in the organization.
Keep your ETL functions as flexible as possible. It is inevitable that they will undergo change over time.
Gather metadata about your own data warehouse in an enterprise metadata repository. Make sure you have a good handle on what you are doing.
Take a good look at DW 2.0 and use the principles found there as a basis for your design of the data warehouse.
Undoubtedly, there are other design practices that need to be followed in building the imperfect data warehouse. This short list represents only the most obvious and the most important of those practices.
In a way, building an imperfect data warehouse is a better option than building nothing at all. And building an imperfect data warehouse – in some cases – may be the only viable option to move forward.
Author’s note: In 2006, Bill Inmon introduced the architecture for the next generation of data warehousing. That architecture is found atwww.inmoncif.com under the section for DW 2.0. Everything on the site is free and available for noncommercial usage.