Steve Hoberman is the author of multiple books on data modeling techniques, including Data Modeling for MongoDB:...
Building Well-Designed and Supportable MongoDB Databases. Hoberman is also a prominent data modeling consultant and instructor who has educated more than 10,000 people across five continents -- including business leaders, business analysts, data modelers, database administrators, developers, data warehouse engineers, project managers and data scientists.
In this interview, he explains common challenges that businesses face in creating effective data models during the analysis and design phases of a project. He also offers tips on data modeling techniques that can aid in the development of useful and accurate data models for operational and BI applications, including ones running on NoSQL databases such as MongoDB.
Editor's note: The following has been edited for length and clarity.
How should businesses decide which of the various forms is most appropriate for their data models? You mention it is based on the intended audience -- can you expand on which model types work for which audiences?
Steve Hoberman: In Data Modeling for MongoDB, one of the key themes is for the modeler to be flexible on the visual form the model takes, and sometimes it requires effort not to model with traditional data modeling notation. When you're working with application development roles with 'data,' 'developer' or 'database' in their job titles, it is OK to use traditional data modeling notation of boxes and lines, such as information engineering notation (e.g., the crow's foot) or unified modeling language. For these roles, I often use this notation consistently for all three levels of models -- conceptual, logical and physical.
For roles with 'business' in their job title, it is important to first gauge how comfortable this audience would be with the traditional data modeling notation. Sometimes this audience of business users, business analysts, etc., already knows the traditional data modeling form or is anxious to learn it, and sometimes it's better to avoid this notation, using instead a notation that they understand. For example, use spreadsheets as the modeling notation when working with financial business experts, or sometimes even pictures of the concepts -- such as a picture of a warehouse that represents the concept of a warehouse.
What are the biggest challenges that businesses face in designing a data model, and what are your suggestions for avoiding or resolving them?
Steve Hobermandata modeling consultant
Hoberman: The biggest challenge is correctly capturing the requirements on the data model. Often when the project starts, there are only vague requirements -- if requirements at all, and the data model must represent these requirements completely and precisely. Therefore, it is a very challenging task to go from ambiguity or vagueness to precision. A lot of questions need to be asked and the results must be documented on the model. This takes time and knowledge of which questions to ask, and, often, projects lack the time as well as the expertise to answer these questions. Data modeling is the process of learning about the business, and it is a time-consuming and challenging process.
As different uses for modeling such as risk mitigation and reverse engineering gain popularity, how does the role of a modeler change and adapt?
Hoberman: With reverse engineering, instead of starting with a clean slate and driving the data models from new system requirements, we organize attributes and rules according to how systems work today. So, the thought process and the deliverables are the same for new development as with reverse engineering, but the starting points are different. Often with risk mitigation and reverse engineering, we are playing the role of 'data archeologist,' using detective skills to determine the meaning of an existing system field and the field's relationships to other fields.
Do you have any data modeling techniques for how to establish precision when there is conflict over definitions and specifications?
Hoberman: There are a number of techniques that work well, and I'll briefly describe two I like best. One modeler I know writes all of her own definitions, and she writes these definitions with such precision that she knows they are all incorrect -- then she gives them to the business to be corrected. She finds it is easier for a business user to correct an existing definition than come up with a definition from scratch.
Another modeler I know insists that development teams define the terms first before naming the term. In order to name a concept, we must know what it is. So, define it first, and then name it. Both approaches are very effective for coming up with precise definitions.
What are the major differences in modeling with a relational versus a NoSQL database?
Hoberman: When the database is a relational database management system (RDBMS), the database design often resembles the logical data model in terms of structure. The areas where an RDBMS database design differs from its logical data model are primarily due to modifications for performance or tool implications. When the database is a NoSQL database, however, the database design can vary dramatically from the logical data model in terms of structure.
How does the schema-less or schema-lite nature of NoSQL databases affect the data modeling process?
Hoberman: The NoSQL databases allow us to add new fields as we are adding the data -- called 'schema-less' or 'schema lite' -- and this allows us to more easily prototype and iteratively build the database prior to completing -- and sometimes even starting -- the data model. In some efforts, the database design is completed, and then the logical and conceptual are built for documentation and support purposes. So, having a schema-less environment sometimes makes a bottom-up approach -- starting from the physical -- possible.
By taking the time to create a data model and employing the correct data modeling techniques, what benefits might a business experience or what complications might they avoid?
Hoberman: The underlying benefit of creating a data model is that the data actually becomes understandable, as others can read it and learn about it. Having this precise document that explains the data leads to tangible business benefits, such as savings on developer and support costs and building higher-quality systems that meet requirements and perform well.
The issue of "design" vs. "description" in a data modeling context
Data modeling concepts for application success