Developing high-quality data models

Limitations of entity relationship models in data modeling

High Quality Data Modeling In this excerpt from Developing High Quality Data Models by Matthew West, readers will learn about the limitations of entity relationship models in the data modeling process and how these limitations affect what type of data model an organization should use. Readers will also gain insight into the debate over using data models vs. reference data.

In this part:

5.1 Limitations of Data Models

Entity relationship models are limited in what they can express. It helps to be aware of the limitations up front, since they can affect the choices you make as to how you represent the enterprise you are modeling.

5.1.1 One Entity Type Cannot Be Shown as Having Another as an Instance

I will talk about classification -- when something is a member or instance of a class -- more generally later. Here you need to understand what is explicit in the entity relationship paradigm. The only classification that the entity relationship paradigm supports directly is records/objects being instances of entity types. For example, if one entity type happens to be an instance of another, there is no way to say this in an entity-relationship diagram.

At first glance, this seems counterintuitive. Surely, I can create a data model like the one in Figure 5-1 to say that pump (the entity type) is a member of the entity type equipment type?

The problem with this is that the lines that we sometimes call relationships are really classes of a relationship. This data model actually says:

“Each pump is a member of exactly one equipment type.” So, according to this data model, it is not the entity type pump that is an instance of equipment type, but an instance of pump that is a member of an instance of equipment type. It is the set of these that is represented by the relationship type line.

There is, however, one type of relationship that is between entity types, and that is the subtype/supertype relationship.

Figure 5-2 shows a subtype/supertype relationship using the EXPRESS notation. You read it as follows:

pump is a subtype of equipment item.”


“Each pump is also a member of equipment item.”

This is a relationship between the entity types; however, it is not a classification relationship, it is a specialization relationship. This can be made clearer by comparing a Venn diagram for real-world pumps and equipment.

In the real world, P1 and P2 are directly members of both the class pump and the class equipment item, as illustrated in Figure 5-3.

In the real world, P1 and P2 are directly members of both the class pump and the class equipment item, as illustrated in Figure 5-3.

A slight complication is that the situation is slightly different when we are talking about data records. Figure 5-4 shows some possible tables that might be used to represent instances of the real-world pump and equipment item classes. The key thing to note is that there is a record in each table that represents the pump, whereas in the Venn diagram (and the real world) there is one pump that is a member of two classes. Thus, when a data model is of data that represents a pump, then the subtype/supertype relationship is saying that for each record in the pump table there is also a record in the equipment item table that represents the same pump.

The consequence of this is that when we use data models to talk about objects in the real world, then a subtype/supertype relationship is a relationship instance between the entity types; however, when it is a model of the data structures in a database, then it is a class of relationship, where the relationship is one of the records in the two tables representing the same thing.

It is a very desirable characteristic of data models that they can be developed as models of things that exist in the real world but then be used as models of data records. However, this example makes it clear that you need to make sure in which way you are using the data model since its meaning is subtly different under different interpretations.

5.1.2 An Entity Type Cannot Be Shown as Having an Instance of Any Entity Type as a Subtype

Continuing with the equipment example, let us take a look at another limitation. You cannot show a subtype relationship between an entity type and an instance of any entity type in the data model. This ought to be fairly obvious, because the only relationship that is supported between entity types and instances is being an instance (member) of the entity type. Figure 5-5 illustrates this issue.

This model allows an equipment item to be classified by an equipment type. Pump is an example instance of equipment type. We already saw in the Venn diagram of Figure 5-3 that pump is a subtype of equipment item. However, there is no way to show this in the data model when pump is represented as an instance of equipment type. That is, there is no way to link the pump instance of equipment type to the entity type equipment item. The full set of relationships is illustrated in Figure 5-6, but this is not an entity-relationship model.

By the way, equipment item is also an instance of equipment type, and this cannot be shown in the entity-relationship diagram either. However, you can make an instance of equipment type that replicates the entity type equipment item, and then you can add a specialization relationship to equipment type and show at the instance level that pump is a subtype of equipment item, just as at the data model level you can add an entity type called pump as a subtype of equipment item.

5.1.3 Understanding the Limitations

It is important that you understand the limitations I just described. They will influence how you choose to model a particular situation so that you are able to say the things that are most important.

5.1.4 Data Model vs. Reference Data

A key choice then is in deciding which things are represented as entity types, and which things are represented as instances of entity types. Of course, only classes (things that have members or instances) can be entity types, but you can model classes either as entity types or as instances of entity types. Deciding where to draw the line is an important decision, since it can significantly impact how understandable your data model is and how flexible it is (how stable it is in the face of changing requirements).

As an example, Figure 5-7 shows two ways in which you can show that P101 is a pump. In the first case, P101 is an instance of the entity type pump. In the second case, P101 is classified by an instance of equipment_type.

Notice that there is no way in the second case to show that pump is a subtype of the entity type equipment_item. You can also see that the entity type equipment_type is essentially part of the metamodel for the first case. This means this style of data model can have an unfamiliar feel. On the other hand, the second case is easy to extend with other equipment types as reference data, even after the system has been built. The choice between these two approaches will depend on the purpose of the data model. Indeed, an important design choice when designing database systems is where to place the divide in the ontology between what part is in the data model and what part is in the master and reference data.

One of the techniques we used in developing the ISO 15926-2 data model was to start by developing a composite model of instances and classes without making commitments, and then to decide which classes would be entity types after we had explored the problem space. This turned out to be the fastest way to do the analysis, since it significantly reduced the number of mistakes we were making, even though initially it looked like extra work.

I introduced a way to model classes and instances together in one diagram in Section 2.6 and used in Figure 5-6, which allows you to defer the choice of which classes will be entity types. It is based on, and almost identical to, the conventions we used in developing the ISO 15926-2 data model.