The potential benefits of cloud computing are inspiring senior IT and business leaders in many organizations to reconsider enterprise data strategy and contemplate how migrating data and applications to the cloud can motivate the modernization of a data architecture.
The concept of data organization and architecture in the past had typically been absorbed by the IT department. But with the raised awareness of data's business value comes the realization that an effective data strategy influences how transactional and operational data helps drive analytics applications that feed into judicious decision-making and profitable outcomes.
This increased scrutiny raises questions about different facets of organizing and managing data, particularly data modeling vs. data architecture. We'll explore how data modeling and data architecture differ, the relationship between data modeling and data architecture as part of the data management process, and the various roles of data modelers and data architects.
Data modeling basics
A data model is an abstract representation of the real-world entities that interoperate within an organization's business environment. It represents data entities, their attributes and how those entities relate to each other. There are three types of data models: conceptual, logical and physical.
- Conceptual data model. This model shows a high-level view of the data the enterprise uses to support business processes. Though this model represents the conceptual objects used by the organization, it's generally unassociated with a specific application or database management system. Rather, it's intended to capture the business's overall information requirements. The conceptual data model is used to communicate with the business to ensure that information processing needs will be met -- for example, the definition of conceptual entities such as customer, product and location and the relationships among those entities: "A customer living at a location purchased a product."
- Logical data model. This model captures details of the characteristics, attributes and relationships among the different entities. It provides the technical perspective of the data objects described in the conceptual model with the definition of specific attributes of each entity -- for example, associating attributes of a customer, such as last name, first name and a customer number, as well as linking a customer to a specific residential location that includes street, city, state and ZIP code attributes.
- Physical data model. This model is specific to the application and storage framework used for that data. For each entity modeled, the physical data model lists the data elements, their data types, lengths and other characteristics relevant to the underlying database management system or alternative storage environment.
What data modelers do
Data models are developed and refined by data modelers, who engage the business data users and solicit their requirements as a prelude to iteratively refining the conceptual, logical and physical data models. Data modelers work with application developers to understand the business processes implemented by the developed application and determine the best representation for the data that supports that application. A data modeler's tasks include the following:
- Engage the business users to assess their information needs.
- Work with application developers to understand implemented business processes.
- Review the business process and conceptualize the entities that interoperate within the business process.
- Determine how the various entities are related and develop entity relationship diagrams that represent the connections among the entities.
- Identify each entity's characteristics and properties and ensure that the entities can be differentiated within the model.
- Develop a logical data model and validate the model to ensure that it serves the needs of the business application and its consumers.
- Transform the logical representation of the model into a physical representation and work with database administrators to instantiate and manage the data.
- Optimize the model to ensure predictable performance.
- Maintain the metadata -- the "data about the data" describing the data model, its structure and semantics.
Data architecture basics
According to DAMA International's Guide to the Data Management Body of Knowledge, data architecture "includes specifications used to describe existing state, define data requirements, guide data integration and control data assets as put forth in a data strategy." In essence, data architecture includes the following strategies and tactics for managing an organization's end-to-end data lifecycles that inform and drive the operational business processes and analytical decision-making:
- Data selection focuses on which data sets are created within the organization and which ones are acquired from outside the enterprise.
- Data infrastructure includes the evaluation and selection of data platforms and associated data management tools and services, implementation of systems in on-premises data centers and in the cloud, and network configuration.
- Data onboarding and integration is about ingesting data from external sources, validating it based on defined data quality criteria, transforming it into usable formats and integrating it with data from internal business applications.
- Data storage includes the use of relational database management systems (RDBMSes) for structured data, text and comma-separated values files, semi-structured and unstructured data managed in NoSQL databases, big data frameworks and cloud object storage services.
- Data utilization identifies the different data consumer communities, assesses their requirements and supports their usage scenarios.
- Data access focuses on the methods of access such as direct querying, extracts and data services.
- Data analysis and presentation includes methods for data organization for reporting and analytical purposes such as the use of a data warehouse and end-user visualization tools.
- Data protection includes perimeter security precautions, encryption methods, and role-based and attribute-based access controls.
- Data governance and stewardship oversees compliance with models, rules and defined policies governing organizational data collection, management and usage.
What data architects do
The role of data architects is much broader than that of data modelers. The job encompasses an array of responsibilities associated with the scope of an enterprise's data strategy that embraces a combination of on-premises platforms and cloud data and application services. A data architect's tasks include the following:
- Outline the data standards and principles that govern data management across data environments, including hybrid on-premises and multi-cloud.
- Scope the types of data management frameworks to be used, including RDBMSes for transactional and operational processing; data warehouses, data marts and data lakes for analytical processing; and end-user data querying and visualization tools.
- Consider operational demands and performance expectations as well as costs and devise a strategy for managing data and applications, increasingly in the cloud.
- Implement a data catalog for listing enterprise data assets along with their characteristics, where those assets are located, access controls and classification of data sensitivity.
- Supervise the use of data modeling tools and technologies, guide the data modelers in developing their models, oversee the data modeling processes and maintain a metadata repository for capturing "data intelligence" about the corporate data landscape.
- Oversee the selection and implementation of data management tools that align with development processes and methodologies.
- Develop and maintain a reference architecture that includes the specification of data domains used across different business applications and organizational lines of business. This framework contributes to the development of an enterprise master data management strategy for unifying domain representations and reducing unnecessary data replication.
- Document how data flows from origination and acquisition points across systems and applications and oversee the development, management and monitoring of data pipelines.
- Outline data integration techniques and processes and select tools for implementation and oversight of integration efforts.
- Specify data access methods and architecting data services to support downstream self-service accessibility for data scientists and analysts.
- Document data quality rules and expectations and select and implement tools for managing and reporting compliance with the data quality requirements.
- Define data protection policies and select the right technologies to implement the policies.
- Monitor, audit and report compliance with internal data standards, externally defined regulations and policies, and performance expectations.
Data modeling and data architecture: Different yet complementary
Clearly there are differences in data modeling vs. data architecture, essentially reflecting a "micro" (modeling) versus a "macro" (architecture) perspective.
Data modeling focuses on the details, content and structure of all the corporate data assets. The goal is to represent business concepts, their relationships and the domains of values that can populate each entity's attributes.
Data architecture focuses on the global level of the data platforms and tools as well as the standards and guidelines for the policies, processes and oversight of enterprise data management. The goal is to establish a solid framework for corporate data processing, organization and usage.
In the process, data modeling and data architecture complement each other. Well-defined data models not only provide the basis for devising enterprise data storage, access and protection policies, but they also inform the data architect's selections of platforms, tools and technologies. An established data architecture simplifies the data modeler's job, especially when good tools and best practices are provided to frame how enterprise data concepts are defined and attributed.
An integrative approach to data modeling and data architecture indicates that an enterprise has attained a high level of data management maturity.