What is data modeling?
Data modeling is the process of creating a simple diagram of a complex software system, using text and symbols to represent the way data will flow. The diagram can be used to ensure efficient use of data as a blueprint for the construction of new software or for reengineering a legacy application.
Typically, a data model can be thought of as a flowchart that illustrates the relationships among pieces of data. It enables stakeholders to identify errors and make changes before programming code is written. Alternatively, models can be created as part of reverse-engineering efforts that extract models from existing systems, as seen with NoSQL data.
Data modeling is an important skill for data scientists and others involved with data analysis. Traditionally, data models were built during the analysis and design phases of a project to ensure that the requirements for a new application are understood. A data model can become the basis for building a more detailed data schema. Data models can also be used later in the data lifecycle to rationalize data designs that were originally created by programmers on an ad hoc basis.
Types of data modeling
Data modelers often use multiple models to view the same data and ensure that all processes, entities, relationships and data flows have been identified. They initiate new data modeling projects by gathering requirements from business stakeholders.
There are three main types of data models:
- Conceptual data model. This approach is a high-level description of a database design that shows how data interrelates and what kind of data can be stored in the database. The intended audience for conceptual data models is the business side of an organization. The conceptual data model defines the data structure that the business requires. Once the conceptual data model is created, it can be refined and transferred into a logical data model.
- Logical data model. These models are used to create the database structure and describe the data from a technical perspective. The technical side of an organization uses logical data models as detailed representations of database designs. This data model serves as the basis for the creation of a physical data model.
- Physical data model. This data model is specific to the application and database to be implemented. It is used to create the tables and fields that store database data. A physical data model describes a database design for a specific database management system (DBMS). Both the technical and business sides of an organization use this type of model.
Data modeling examples
Data modeling emerged in the 1960s as DBMSes became more popular. It enabled organizations to bring consistency, repeatability and disciplined development to data processing. Application users and programmers used the data model as a reference when communicating with database designers.
Some examples of data modeling approaches include the following.
Hierarchical data modeling
Hierarchical data models organize data in a treelike, one-to-many arrangement. This model originally replaced file systems in many popular use cases. IBM's Information Management System is an example of the hierarchical approach, which was widely used in businesses, especially banking. Although hierarchical data models were mostly superseded -- beginning in the 1980s -- by relational data models, the hierarchical method is used today in Extensible Markup Language and geographic information systems.
Network data modeling
Network data models developed as a way to provide data designers with a broad conceptual view of their systems. For example, the Conference on Data Systems Languages, formed in the late 1950s, guided the development of a standard programming language that could be used across various types of computers.
Relational data modeling
The relational data model was proposed as an alternative to the hierarchical data model, which required a detailed understanding of the physical data storage employed. The relational data model does not require developers to define data paths.
Relational data modeling was first described in a 1970 technical paper by IBM researcher E.F. Codd. Codd's relational model set the stage for industry use of relational databases, which use tables to connect data segments, as compared to the hierarchical model where data is implicitly joined together. Relational data modeling was coupled with Structured Query Language, which gained a foothold in enterprise computing as an efficient means to process data.
Entity relationship modeling
Relational data modeling took another step forward as the use of entity relationship (ER) models became popular. ER models use diagrams to graphically depict the elements in a database and facilitate the understanding of underlying models.
With relational modeling, data types are determined and rarely changed over time. Entities, or objects, consist of attributes. For example, an employee entity attribute could include last name, first name, years employed and so on. Relationships are visually mapped, providing a way to communicate data design objectives to participants in data development and maintenance. Over time, data architects adopted modeling tools, such as Idera's ER/Studio, Erwin Data Modeler and SAP PowerDesigner, for designing systems.
As object-oriented programming advanced in the 1990s, object-oriented modeling gained traction as another way to design systems. Object-oriented approaches are similar to ER methods, but they differ because they focus on object abstractions of real-world entities.
Objects are grouped in class hierarchies, and the objects can inherit attributes and methods from parent classes. This inheritance trait provides some advantages compared with ER modeling; it ensures data integrity and supports complex data relationships. At the same time, data models emerged for data warehousing needs. Notable examples are snowflake schema and star schema dimensional models.
Graph data modeling
An offshoot of hierarchical and network data modeling is the property graph model, which, together with graph databases, is increasingly used for describing complex relationships within data sets. It is popular in social media, recommender and fraud detection applications.
Using the graph data model, designers describe their system as a connected graph of nodes and relationships. Graph data models can be used for text analysis and to create models that uncover relationships among data points within documents.
Learn more about the data modeling techniques and the challenges businesses face with the data modeling process from author and data modeling instructor Steve Hoberman.