Developing high-quality data models

Challenges data modelers face with enterprise data modeling techniques

High Quality Data Modeling In this excerpt from Developing High Quality Data Models by Matthew West, readers will learn about some of the challenges data modelers face when implementing enterprise data modeling techniques. Readers will also learn about the role of data models and the key issues surrounding data modeling techniques.

In this part:

5.2 Challenges in Data Modeling

One of the key challenges data modelers face is that the characteristics of a good data model depend on the purpose. So something could be appropriate in the context of an application data model, but not in the context of an enterprise or integration data model. As stated earlier, I shall be looking at what makes a good enterprise or integration data model. This means, among other things, that I will look at what it is in some application data models that may mean they are not directly suitable as integration data models, and why an integration or enterprise data model cannot be constructed simply by aggregating together the available application data models.

5.2.1 Key Requirements for Information Systems

To manage information, you need to be able to meet the following requirements:

  • Know what information exists and what it is about.
  • Extract portions of the information suitable for a particular purpose.
  • Exchange data between organizations and systems.
  • Integrate information from different sources, resolving what information is about things they already have information about, and what is about new things.
  • Share the same data between applications and users with different views.
  • Manage the data, including history, for life.

It is not unusual for meeting some or all of these requirements to be difficult and expensive. So it is worth looking at the underlying causes of why this is so.

5.2.2 The Reality of Computer-Based Information

You may find a number of problems as a result of the way information systems hold data:

  • Arbitrary or inappropriate restrictions are placed on the data that can be held because of the data structures and constraints imposed.
  • Historical data cannot be held because the data structure is designed to hold the current state only, and because it replaces the previous state with the current one when it changes.
  • False data may be introduced to overcome restrictions. When data requirements are not supported by a system, users may use fields for unintended purposes. This will mean that usage will not match that intended by the data model. Sometimes more than one group will use the same field for different purposes, which means that not only does the usage not match the intention, but the usage is inconsistent.
  • Uncontrolled redundancy of data arises from the same data occurring and being updated in multiple systems. This requires subsequent reconciliation of different versions.
  • Difficulty may arise in integrating data from different sources because of incompatibility in the definitions and format of data. Indeed, consistency between systems can only be expected if positive action was taken at the outset to ensure consistency.

All of these problems either restrict the way a company does business or add to the cost of doing business. Here are some financial and time penalties incurred when these problems are encountered:

  • Translating data is expensive. The cost of interfaces to translate the meaning of data from one system to another can account for 25 to 70 percent of the total cost of a system development project.
  • The need to translate data means that users of different systems can often only share data sequentially, and not concurrently. This can extend the time required for critical business processes.
  • There is a slower response to the need for change in systems. Interfaces cost time as well as money to change.
  • Quality suffers. Uncontrolled replication of data invites errors, which may lead to inferior business decisions.
  • Staff time is wasted trying to locate and reconcile data.

5.2.3 The Role of Data Models

Data models, and especially integration and enterprise data models, support data and computer systems by providing a single definition and format for data. If this is used consistently across systems, then they can achieve data compatibility. An integration data model or enterprise data model provides the definition and format applications need in order to exchange and integrate data. Each application knows how fields are to be used because of the way they are mapped to the integration or enterprise data model. The results of this are shown in Figure 5-8.

However, systems and interfaces often cost more than they should to build, operate and maintain. They may also constrain the business rather than support it. A major cause of this is that the quality of the data models implemented in systems and interfaces is poor.

  • Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces.
  • Entity types are often not defined or incorrectly defined. This can lead to replication of data, data structure and functionality, together with the attendant costs of that duplication in development and maintenance.
  • Data models for different systems are arbitrarily different. This is because each system was approached independently, without regard for how the data might be shared in the future. The result of this is that complex interfaces are required between systems in order to share data.
  • Fixing the data model of the physical database is not sufficient to ensure compatibility. Some organizations have standardized on a particular application in order to fix the data model and ensure compatibility. Later, however, they discovered that different implementations of the system were incompatible because of the way that they had been configured and used.
  • Data cannot easily be shared electronically with customers and suppliers because the structure and meaning of data has not been standardized. For example, product catalogs and engineering design data and drawings for the process plant are still sometimes exchanged on electronic paper.

The reason for these problems is a lack of appropriate standards that will ensure that data models will both meet business needs and be consistent.

5.2.4 Desiderata

From the business requirements outlined earlier the following requirements for data models are derived. They should:

  • Meet the data requirement.
  • Be clear and unambiguous to all (not just the authors).
  • Be stable in the face of changing data requirements.
  • Be flexible in the face of changing business practices.
  • Be reusable by others.
  • Be consistent with other models covering the same scope.
  • Be able to reconcile conflicts with other data models.

In addition, it should be possible to develop data models quickly. The ontological approach to data modeling I present here is aimed at producing data models that meet these desiderata (Latin for “desired things”).

5.2.5 Some Key Issues for Data Models

Systems sometimes cost more than they should. Some of the reasons for this are attributable to how data modeling is done (or the lack of it), and these are illustrated in Figure 5-9.

Sometimes apparently small enhancements to a system cause major rework in the system or interfaces. This problem points to inflexibility in the original data models.

This is also a major cause of the repeated development of essentially the same system. If “how things are at some time and place” is built into a system, then any restrictions imposed by the system must be accepted by anyone wishing to use it. Otherwise, the system will be misused or rejected. This is the challenge faced by those who build packages.

System interfaces account for 25 to 70 percent of the development and support costs of current systems. The primary reason for this cost is that these systems do not share or are not mapped to a common data model. If data models are developed and implemented on a system-by-system basis, then not only is the same analysis repeated in overlapping areas, but further analysis must be performed to create the interfaces between them. The physical data model development may be done with packages, but how the data model is used and how it is mapped to the corporate or integration data model will determine the degree of consistency in definitions. This in turn will provide the basis for exchange and integration of data.

Most systems contain the same basic components redeveloped for a specific purpose. For instance, the following can use the same basic classification model as a component:

  • Materials catalog
  • Product and brand specifications
  • Equipment specifications

The same components are redeveloped because we never notice they are the same thing. An integration framework would show they are examples of a more abstract or general pattern.

A lot of the inconsistency that arises between data models is because of the different ways in which real-world objects are represented in entity-relationship diagrams. Figure 5-10 shows some representations I have found in data models I have reviewed. All the possibilities are covered.

If the same concepts are modeled in different ways, then there is no way that you can expect that different models of the same thing will look the same.

The differences between how things get modeled are caused by building models that have a specific application viewpoint or specific rules and constraints built in. Since others may have different rules or a different application viewpoint, we know we have to understand how to represent the world in a neutral way so the resulting integration and enterprise data models are flexible and stable.