Building a data catalog is an important initiative for many IT teams, but organizations should not undertake such...
a project without input from business partners.
First, it is important to define a data catalog: It is a reference application that allows business and technology users to explore data sources, understand the contents of those sources (metadata), connect the data to the source, and become more self-sufficient with their data and metadata access. A data catalog explores databases and BI systems. It also provides a single point of reference for enterprise metadata management, faster and more effectively than former metadata management systems.
The main steps in building a data catalog would include:
- Designing a subject area model that will serve as the basis for the data architecture of the data catalog. An effective data catalog follows the business use of the data, not the technical applications' implementation. A subject area model (SAM) that defines each subject and the concepts that are contained within the subject shows the business users the location of their data unconstrained by applications or files or databases. The data catalog should be built based on the SAM.
- Using data stewards and IT representatives to discover and access metadata from all databases and files. Data catalogs use metadata to identify data tables, files and databases. The catalog searches the company's databases and loads the metadata (not the actual data) into the data catalog. Before an organization begins building a data catalog, metadata sources must be identified and recorded. This is a major step and requires that the organization have a solid data stewardship program because business data stewards are needed to provide insight for the correct data sources to use.
- Building a metadata dictionary (not a business glossary). A data dictionary contains the description and mapping of every table or file and all of their metadata entities. This data dictionary becomes the basis for the data catalog. Again, the business data stewards are essential here, since they will provide the business metadata to be used in the data catalog -- by source, concept and subject area.
- Profiling the data to provide statistics for data consumers. These profiles are informative summaries that explain the metadata. For example, the profile of a database often includes the number of tables, files and row counts.
- Identifying relationships among sources. Discover related data across multiple databases. For example, an analyst may need consolidated customer information. Through the data catalog, she may find that five files in five different systems have customer data.
- Building data lineage. Extract, transfer and load tools are used to extract metadata from source databases, transform and cleanse the metadata and load it into a target database. This enables the analyst to trace errors back to the root cause in the analytics.
- Organizing the catalog for human use, again based on the SAM. Most files and databases are designed for use by technology. Data catalogs should be designed for data consumers as much as for technologists. In addition, a data catalog should be accessible via computer, tablet and mobile applications.
Building a data catalog is a task that should involve teams from both the IT and business, to ensure that it focuses on users' needs and enables them to manage metadata across the enterprise. Effective data catalog planning, development and implementation can bring metadata management into the business community and provide lasting value for data assets.
Dig Deeper on Data quality techniques and best practices
Related Q&A from Anne Marie Smith, Ph.D.
Expert Anne Marie Smith shares five reasons why organizations' analytics programs might fail and how a data management framework and other programs ... Continue Reading
Expert Anne Marie Smith explores challenges an organization may face when apply data governance policies to data lakes -- and the benefits of doing ... Continue Reading
Defining a data strategy can help focus an organization's data management initiatives -- but it isn't the same as data governance. Expert Anne Marie ... Continue Reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.