How long should the data model take?

There are several approaches for data modeling to ensure that the length of time spent in model development is appropriate for the organization.

This article originally appeared on the BeyeNETWORK.

It defies sensibilities to suggest that there is a certain length of time that it takes to build a data model, but there are some practical realities that exist, even if there is no theoretical limit.

After all, aren’t corporations different? And don’t some corporations have a lot more complexity than other corporations? If that’s the case, why is there is some limitation as to how long it should take to build the data model?

The truth is that the management of a corporation can wait only so long for the data model to be built. Data modeling requires expensive resources, and data modeling is the gateway to many other activities, such as actually building a system. Therefore, the longer the data modeler takes, the more anxious management becomes. As a point of practicality, there comes a time when the organization can wait no more.

So how does the data modeler accommodate this limitation of practicality? There are several approaches.

Define the Scope of Requirements

The first approach is to carefully define the scope of requirements. As long as the boundaries of the system are not defined, or are defined loosely, then the data model will continue on forever. Unless you want to have a perpetual data modeling exercise, you need to carefully and eloquently define the limits of the data model.

In addition, you need to avoid scope creep. Scope creep is a fact of life with data models and data warehouses. The longer the model takes, the worse scope creep becomes. Therefore, the first iteration of the data model should be done quickly.

Model Only Granular Data

The second approach is to model only granular data. If the modeler gets to be caught up in the trap of modeling derived data, then the modeling exercise never ends because there is a huge amount of derived data, because there are business rules and formulas that go with the derived data, and because these formulas are in a constant state of upheaval. As soon as you define one formula, two others change.

Build in Iterations

The third approach is to build the data model in iterations. You don’t set out to define everything. You set out to define and build the model for only a subset of the total model. The keys to the other parts of the yet unbuilt model are carefully defined, but that is as far as it goes. The attributes, the structure, the “type of” relationships for the remaining part of the model are not built, or else the data model will take forever.

Avoid the Waterfall Mentality

But perhaps the most important approach is that of avoiding the waterfall mentality. In the waterfall mentality (which came to us in the SDLC – systems development life cycle), there was this notion that all activity must be built to completion before the current phase of design could be concluded. This was one of the foundations of the development of operational systems. But in a data warehouse, for a variety of reasons, it is almost impossible to have this same attitude. In the data warehouse development environment, some of your system will be in one state of modeling or development, another part of your data warehouse will be in another state of design or development, and yet another part of the project will be in still another state of design and development. This is the only rational way that the development environment can succeed.

Design for the Most General Interpretation

Finally, data must be abstracted. There should not be a data model component for a man and a separate data model component for a woman. There should be one data model component for an individual, and there should be a data element in the component that specifies gender.

In the same vein, the data model should be built for the most general interpretation of data. Then qualifiers should be added that allow the analyst to separate out different occurrences of data. For example, if CUSTOMER is being modeled, the model should be for the broadest interpretation of CUSTOMER. Then the modeler includes information about the different types of CUSTOMER, such as gender, age, address, when the first purchase was made, when the last purchase was made, and so forth.

There are then some very good ways to avoid the eternal development project:

  • Define the scope of the project up front and manage scope creep.

  • Concentrate on keys and granular data. Avoid derived data.

  • Deliberately operate on the basis of iterative development. Do not try to swallow the elephant whole. Cut the elephant up into small pieces before trying to swallow.

  • Recognize that a properly built data warehouse environment will have parts of the project in different states of repair at any moment in time.

  • Recognize that abstractions of data must be done and that the data model should be designed for the most general interpretation of data and that qualifiers must be put into the data model in order to allow the analyst to choose what type of data is needed for analysis.


Bill Inmon

Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations. Bill can be reached at 303-681-6772.

Dig Deeper on Data modeling tools and techniques