News Stay informed about the latest enterprise technology news and product updates.

Data modeling in the government information factory

The data model is an intellectual roadmap to the contents of the government information factory.

This article originally appeared on the BeyeNETWORK.

One of the essential design components of the Government Information Factory (GIF) is the data model. The data model is an intellectual roadmap to the contents. The data model is used for many purposes, such as -

  • development – the model helps determine  the essential components and how they are related,
  • usage - once built, the data model serves as a map to what is available,
  • maintenance - once the GIF is in a maintenance mode, the data model describes how pieces should be added or modified, etc. 

But perhaps the most important aspect of the data model for the GIF, is that it gives a larger perspective of the environment. The data model permits the designer to step back and look at the information found in the GIF from a holistic perspective. The data model provides a high-level view of what data is in the GIF. Given the size and complexity of the GIF, that view can be very useful.

There are different levels of data modeling. There is a high-level data model, a mid level data model, and a low-level data model. All three levels of modeling are necessary for the construction of the GIF, although not all components are needed at the beginning.

The high-level data model consists of a description of the major entities of the enterprise and their relationship to each other. Sometimes this structure is called a subject area diagram. A major entity or subject area is "taxpayer", or "agency", or "transaction." Each subject area has major relationships with other subject areas. A relationship may be "the agency reports to Congress" or "a taxpayer makes payments to an agency." The high-level data model is noted for its simplicity and its abstraction.

The following figure shows a simple example of the nomenclature used for a high-level data model. In the high-level data model there are the major entities of agency, external agency, taxpayer, and transaction that are represented. There are relationships between agency and external agency, agency and taxpayer, taxpayer and transaction, and agency and transaction.



The relationships that are found at the high-level data model are those that are based on a business rule. The business rule is indicated, usually in a short hand manner when specifying the relationship. The relationship is usually a 1:n relationship, although it is possible to have a 1:1 relationship (which is very rare) or an m:n relationship, which is very common. Usually when there is an m:n relationship, the m:n relationship is broken into a series of 1:n relationships in the lower level models.

Occasionally, there is what is called a recursive relationship. Most relationships point from one entity to another. But occasionally a relationship will point from the subject area back into itself.

The implementation of a recursive relationship requires special programming techniques in order to be handled properly.

Note that there is a scarcity of detail found in the high-level data model, and that the subject areas that are found there are at the highest level of abstraction.

The next level of data modeling is that of the mid-level data model or entity-relationship model. The mid-level data model contains the information needed at a detailed attribute level. The following figure shows a mid-level data model for the government environment containing detailed information about an agency.

The agency, itself has its own set of information that includes, name, address, and phone number. Over time, there have been multiple managers of the agency and there are different branch offices.  In addition, there are different types of offices, administrative, financial, and customer affairs.

One of the features of the mid level model is the specification of keys and relationships. Most relationships are of the key/foreign key variety. The keys are identifiers for individual records and may be made up of concatenated fields. The keys may or may not be unique.   The keys merely indicate a standard way for data to be accessed.

The relationships that are specified for the mid-level data model are key/foreign key relationships. In a key/foreign key relationship, the key resides in a single unit of data. There may or may not be "foreign keys" that point to the basic key.  If, however, there is a foreign key, the foreign key must have an occurrence of data. The relationship then is a 1:n relationship.

The low-level data model is where the physical characteristics of the data model are found. Below is an example of a physical model in which the agency name and other attributes have various physical characteristics.

Some of the considerations to be made in the low-level data model are:

  • length of data in bytes
  • number of records
  • record growth rate
  • physical indexing structures
  • loading parameters
  • sequence of fields of data
  • meaningful naming conventions
  • documentation
  • ability to keep record updated
  • performance of transactions operating on data
  • recovery time for a data base
  • back up time, etc.

There is a detailed mid-level model for each subject area found in the high-level data model:

In turn each separate section of the mid level model has its own physical definition:

Following is the big picture of how all three levels (high, middle and low) of the physical model are related:

The high-level data model or subject area model is like a globe of the world. The mid-level data model is like the state of Texas. And the low-level data model is like the street map of Dallas.

At first glance the fully attributed data model below appears to be daunting. There are many attributes, keys, and details to be placed into the data model. Because of the size and the complexity of the data model, it is a natural reaction to declare that the model will never be completed and that the data model poses a major obstacle to the building of the GIF. Fortunately, there are two approaches that can be employed to reduce the complexity of the construction of the data model.

The first approach is to build the data model in iterations:

The GIF can be constructed in iterations (phases) as each part of the data model is completed.  Each iteration builds on the next until the entire data model has been constructed.  

The second approach that can be employed is to use a generic data model. A generic data model contains the basic elements required for any government agency.  A generic data model is a way to get the data modeling process started quickly.  When a generic data model is used, it is understood that the data model will be modified to reflect the unique needs of each agency.  

Once the data model is built, the data warehouse can be built. And once the data warehouse is built, the infrastructure surrounding the data warehouse can be built. The atomic data found in the data warehouse provides the foundation needed to serve the entire analytic community of GIF users.


Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations. Bill can be reached at 303-681-6772.

Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Dig Deeper on Data modeling tools and techniques

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.