Manage Learn to apply best practices and optimize your operations.

Data modeling techniques explained: How to get the most from your data

In this interview, author and data modeling instructor Steve Hoberman discusses techniques for dealing with challenges that may arise in the data modeling process.

Steve Hoberman is the author of multiple books on data modeling techniques, including his most recent: Data Modeling...

for MongoDB: Building Well-Designed and Supportable MongoDB Databases. Hoberman is also a prominent data modeling consultant and instructor who has educated more than 10,000 people across five continents. In this interview, he explains common challenges that businesses face in creating data models and offers tips on data modeling techniques that can aid in the development of useful and accurate data models for operational and business intelligence applications, including ones running on NoSQL databases such as MongoDB.

How should businesses decide which of the various forms is most appropriate for their data models? You mention it is based on the intended audience -- can you expand on which models work for which audiences?

Steve Hoberman: In Data Modeling for MongoDB, one of the key themes is for the modeler to be flexible on the visual form the model takes, and sometimes it requires effort not to model with traditional data modeling notation. When you're working with application development roles with 'data,' 'developer' or 'database' in their job titles, it is okay to use traditional data modeling notation of boxes and lines, such as information engineering notation (e.g., the crow's foot) or unified modeling language. For these roles, I often use this notation consistently for all three levels of models -- conceptual, logical and physical.

Steve HobermanSteve Hoberman

For roles with 'business' in their job title, it is important to first gauge how comfortable this audience would be with the traditional data modeling notation. Sometimes this audience of business users, business analysts, etc., already knows the traditional data modeling form or is anxious to learn it, and sometimes it's better to avoid this notation, using instead a notation that they understand. For example, use spreadsheets as the modeling notation when working with financial business experts, or sometimes even pictures of the concepts (such as a picture of a warehouse that represents the concept of a warehouse).

What are the biggest challenges that businesses face in creating a data model, and what are your suggestions for avoiding or resolving them?

Hoberman: The biggest challenge is correctly capturing the requirements on the data model. Often when the project starts, there are only vague requirements (if requirements at all), and the data model must represent these requirements completely and precisely. Therefore it is a very challenging task to go from ambiguity or vagueness to precision. A lot of questions need to be asked and the results must be documented on the model. This takes time and knowledge of which questions to ask, and often projects lack the time as well as the expertise to answer these questions. Data modeling is the process of learning about the business, and it is a time-consuming and challenging process.

As different uses for modeling such as risk mitigation and reverse engineering gain popularity, how does the role of a modeler change and adapt?

Hoberman: With reverse engineering, instead of starting with a clean slate and driving the data models from new system requirements, we organize attributes and rules according to how systems work today. So the thought process and the deliverables are the same for new development as with reverse engineering, but the starting points are different. Often with risk mitigation and reverse engineering, we are playing the role of 'data archeologist,' using detective skills to determine the meaning of an existing system field and the field's relationships to other fields.

Do you have any recommendations on how to establish precision when there is conflict over definitions and specifications?

Hoberman: There are a number of techniques that work well, and I'll briefly describe two I like best. One modeler I know writes all of her own definitions, and she writes these definitions with such precision that she knows they are all incorrect -- then she gives them to the business to be corrected. She finds it is easier for a business user to correct an existing definition than come up with a definition from scratch. Another modeler I know insists that development teams define the terms first before naming the term. In order to name a concept, we must know what it is. So define it first, and then name it. Both approaches are very effective for coming up with precise definitions.

What are the major differences in modeling with a relational versus a NoSQL database?

Hoberman: When the database is a relational database management system (RDBMS), the database design often resembles the logical data model in terms of structure. The areas where a RDBMS database design differs from its logical data model are primarily due to modifications for performance or tool implications. When the database is a NoSQL database, however, the database design can vary dramatically from the logical data model in terms of structure.

How does the schema-less or schema lite nature of NoSQL databases affect the data modeling process?

Hoberman: The NoSQL databases allow us to add new fields as we are adding the data (called 'schema-less' or 'schema lite'), and this allows us to more easily prototype and iteratively build the database prior to completing (and sometimes even starting) the data model. In some efforts, the database design is completed, and then the logical and conceptual are built for documentation and support purposes. So having a schema-less environment sometimes makes a bottom-up approach (starting from the physical) possible.

By taking the time to create a data model, what benefits might a business experience or what complications might they avoid?

Hoberman: The underlying benefit of creating a data model is that the data actually becomes understandable, as others can read it and learn about it. Having this precise document that explains the data leads to tangible business benefits such as savings on developer and support costs and building higher-quality systems that meet requirements and perform well.

Next Steps

Learn why data modeling concepts are important for business success in an excerpt from Steve Hoberman's most recent book.

This was last published in July 2014

Dig Deeper on Data modeling tools and techniques

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

The best way to model data, is to model data so it can be used for many purposes. Most data modelers model data for a single purpose, not thinking about how else the data might be used. For example an ERP data modeler will model data so that it satisfies the OLTP needs of an ERP application, but not take into consideration downstream uses of that data such as Business Intelligence, or Data Analysis. The same can be said about BI and DA modelers, they are only concerned with satisfying the current need.

However, the Spider Schema Data Model is a data modeling technique that I do not see mentioned in this article, but satisfies all Data Modeling needs!