This article originally appeared on the BeyeNETWORK.
Is your data out of control? Is your enterprise data management fragmented and inconsistent? Can you answer these questions? If the answer to any of these questions is no, then enterprise data management may be a discipline you should investigate. Because data is the foundation of all business decisions, enterprise data management (EDM) is continually gaining acceptance as a critical function of IT. Data must be carefully managed and controlled to achieve its full usefulness and value to the organization, and to allow sound business decisions to be made and refined. EDM comprises several components: data quality, data governance, master data management and a managed metadata environment. Typically, we manage each of these components separately with little or no overlap. Sometimes they are managed by many organizations using many disparate tools. But does this approach really accomplish what we are all looking for – consistent, high quality, dependable, interoperable data assets? If the answer is no, we must ask how can we achieve data nirvana with EDM. A metadata-driven methodology for EDM allows all components to share information about the data as it moves through its life cycle, thereby enabling consistency, accountability and true control of data assets.
What is a metadata-driven EDM? It is simply the centralized management of all metadata to create a semantically rich, robust and dynamic metadata interchange. Metadata flows bi-directionally from source to metadata repository and back to source. The managed metadata environment (MME) becomes the origination point for semantic changes, and also the system of record for security, compliance, access and regulatory policy. And of course, the MME is the single source of truth for all information about data and process.
Let’s examine how a metadata-driven approach could affect each component of EDM.
- Data Quality: To achieve data quality, a holistic view of the data is required. Solving a problem in application A can break application B. Having access to the metadata forall the steps in the life cycle allows the data quality team to easily spot points of failure, origination points and redundancies. When data quality issues are discovered, the metadata will point to the appropriate business and IT contact to begin the process of correcting the data.
- Data Governance: Data governance requires an owner and a steward for every piece of data. Having a named person who is responsible for the care and feeding of the data at any given point in its life cycle (“steward”) expedites data quality issues, change requirements, influences the development of new applications and can enable reuse of the data. Governance also helps us understand whether the data is enterprise-level or business-unit or subject-area specific data, knowledge which can be of extremely valuable in data warehouse development. Since data stewards understand their data intimately, drive standards and data quality, and generally have a vested interest in “their” data, they can be wonderful champions for a metadata-driven approach.
- Master Data Management (MDM): MDM is defined as the formulation and implementation of a unified set of principles, processes and practices, fully supported by a governing body, to provide consistent management for all corporate master data environments. MDM is such a logical way to track, explain, understand, compare and report on master data that it should be a fundamental in all organizations.
For most of us, master data is the beginning of the data life cycle. A managed metadata environment allows us to understand redundant master data (sounds like an oxymoron, doesn’t it?) and whether we can clean it and reuse it. Once again, the data steward plays a critical role in this process. Nowhere in the data life cycle is impact analysis more critical than in targeting master data to be managed as part of the enterprise’s key assets. Complete, accurate and reportable metadata easily reveals the impact of changes to master data on ALL systems and their owners, which means no nasty surprises in data usage at any point in the life cycle.
- Managed Metadata Environment (MME): While we certainly capture information about data quality, data governance and master data in our MME, isn’t there more to our total data galaxy? What about our OLTP systems, data warehouses, messages and unstructured data? They should all be part of the MME. An MME allows us to link all data to its stewards, master data sources (if not the original source) and applications that access or update it. An MME shows the flow of the data into and through the OLTP systems, who is responsible for those systems as well as the data created and updated therein. It allows us to see these systems as sources and targets, both for other OLTP systems as well as for data warehouse environments. What happens to the data as it flows along its path? Where are our points of failure? What metrics are we capturing? All this information should be part of an MME.
The Metadata-Driven Methodology
If all these attributes are captured, managed and reported from a managed metadata environment, doesn’t it make sense to make the MME the center of our data universe? That’s what a metadata-driven methodology is all about – using the metadata as the starting point for all EDM functions. Imagine a data warehouse front end that simply provides the user with a pick list of metadata objects on which to build a report. Based on the user, the MME would decide which metadata objects to display, which associated objects the user was cleared to see, what the content should contain, what formula to apply to any derived or calculated fields, and how and where to deliver the finished report. All this is metadata. Unfortunately, it is currently captured in many places by many applications and not typically managed as an asset. Having a single source for metadata knowledge is tantamount to having the Rosetta stone for the organization’s data.
Imagine every new development project starting with the MME to determine what data currently exists which can be reused, who owns the data, how comprehensive it is, what other processes affect it and where it is currently reported. The MME would also determine project scope by performing impact analysis for the entire data life cycle. It would deliver data models to your data architects, transformation rules to your integration architects, and data quality requirements and metrics to your data quality staff – all from a single point of reference.
How to Accomplish Metadata-Driven EDM
Okay, so you agree to the premise that a metadata-driven enterprise data management approach is the right thing to do. Can you actually do it with existing tools? Probably not, unfortunately. Current metadata management tools are not able to effectively extract and couple all the metadata needed to produce a robust data lineage that must include identifying data stewards, transformation/integration rules, processes, metrics and security. Could you accomplish this with existing technology? Probably, but it would not be a trivial exercise.
What is the answer to this conundrum? There are many factors which will help you decide which route to take. First, is your organization managing its metadata at the enterprise level currently? If so, you are on the path to success with a metadata-driven EDM approach. If not, rally the troops and educate your organization and its management on the critical nature of metadata management. Sarbanes-Oxley (SOX) can be a good place to start! Management already understands the importance of Sarbanes-Oxley compliance. Metadata management can make gathering the legally required information for SOX a breeze.
The first step in any data-related project – or any project actually – is requirements gathering. Really understand the metadata requirements for your organization. Can the metadata-driven approach satisfy most of these requirements? If so, you can make a business case to continue your quest. Create a metadata model to help you understand these requirements. In the future, this model may become the basis for extending your current MME.
For some organizations, a large data warehousing project provides a great place to start driving this approach. To successfully build and administer a data warehouse, metadata must be clearly understood and managed. On the sourcing end of the warehouse, metadata describes where the data is extracted, who owns it, what transformations and mappings must occur to get it into the warehouse plus any cleansing activities. Operational metadata is required to properly manage, distribute and secure the data in the data warehouse. Building or interfacing with the data delivery layer of a data warehouse can be enhanced by well-managed metadata. If you are extracting data from the warehouse into marts, use your MME to capture, track and report on the destination for each data element and what rules are applied. Operational metrics may also be part of your MME. Obviously, a great deal of metadata is created and should be managed within the data warehouse. Since a data warehouse project is an area that is fairly well understood, unlike some of our legacy OLTP systems, it is often a good starting point for a metadata-driven EDM approach.
Sound like a big job? It is. This is why we are all counting on the metadata repository vendors to continue expanding and enhancing their products in the years to come. An MME must provide a way to capture, associate, maintain, navigate and report on the most valuable asset in your organization – metadata.
In the final analysis, it is obvious that a metadata-driven EDM is the best way to care for your data, but few organizations are ready for the cultural and technical challenges it presents. By continuing to grow awareness of the essential value of enterprise data and metadata management, by urging software vendors to embrace this vision, and by continuing to support the principles, processes and practices of good data management, we lay the foundation for future realization of metadata-driven enterprise data management.