Get started Bring yourself up to speed with our introductory content.

What steps are key to building a data catalog?

An enterprise data catalog can help an organization effectively manage metadata and explore data sources. Expert Anne Marie Smith offers seven tips for creating a data catalog.

Building a data catalog is an important initiative for many IT teams, but organizations should not undertake such...

a project without input from business partners.

First, it is important to define a data catalog: It is a reference application that allows business and technology users to explore data sources, understand the contents of those sources (metadata), connect the data to the source, and become more self-sufficient with their data and metadata access. A data catalog explores databases and BI systems. It also provides a single point of reference for enterprise metadata management, faster and more effectively than former metadata management systems.

The main steps in building a data catalog would include:

  • Designing a subject area model that will serve as the basis for the data architecture of the data catalog. An effective data catalog follows the business use of the data, not the technical applications' implementation. A subject area model (SAM) that defines each subject and the concepts that are contained within the subject shows the business users the location of their data unconstrained by applications or files or databases. The data catalog should be built based on the SAM.
  • Using data stewards and IT representatives to discover and access metadata from all databases and files. Data catalogs use metadata to identify data tables, files and databases. The catalog searches the company's databases and loads the metadata (not the actual data) into the data catalog. Before an organization begins building a data catalog, metadata sources must be identified and recorded. This is a major step and requires that the organization have a solid data stewardship program because business data stewards are needed to provide insight for the correct data sources to use.
  • Building a metadata dictionary (not a business glossary). A data dictionary contains the description and mapping of every table or file and all of their metadata entities. This data dictionary becomes the basis for the data catalog. Again, the business data stewards are essential here, since they will provide the business metadata to be used in the data catalog -- by source, concept and subject area.
  • Profiling the data to provide statistics for data consumers. These profiles are informative summaries that explain the metadata. For example, the profile of a database often includes the number of tables, files and row counts.
  • Identifying relationships among sources. Discover related data across multiple databases. For example, an analyst may need consolidated customer information. Through the data catalog, she may find that five files in five different systems have customer data.
  • Building data lineage. Extract, transfer and load tools are used to extract metadata from source databases, transform and cleanse the metadata and load it into a target database. This enables the analyst to trace errors back to the root cause in the analytics.
  • Organizing the catalog for human use, again based on the SAM. Most files and databases are designed for use by technology. Data catalogs should be designed for data consumers as much as for technologists. In addition, a data catalog should be accessible via computer, tablet and mobile applications.

Building a data catalog is a task that should involve teams from both the IT and business, to ensure that it focuses on users' needs and enables them to manage metadata across the enterprise. Effective data catalog planning, development and implementation can bring metadata management into the business community and provide lasting value for data assets.

Dig Deeper on Data quality techniques and best practices

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

What are your tips for building a data catalog?

Great pointers Anne. What are your thoughts on operationalizing such a data catalogue? Considering the data platforms have become hybrid than ever before and data catalogue is a way forward but consolidation and governance of an enterprise wide data catalogue will be successful only if it is truly operational (i.e. accessible via various channels).


Achal Patel