Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

What are the main features of data catalog software?

Data catalogs serve as data portals for self-service business analytics users. Learn how the data is curated, what capabilities are included, the market vendors and the benefits.

Employees who rely on self-service business analytics tools to make data-driven business decisions need access to a lot of data, but they can't be allowed to just pull raw data out of a data lake or other big data repositories; the data they use must be curated to ensure it is accurate and appropriate. That's where data catalog software comes in.

A data catalog is a type of metadata management system that is user-friendly enough for the average business user. Data catalogs are used to build portals in which users can find data that has been curated by data stewards or other data professionals. They classify the data in terms that business users understand and provide context around the data so it can be used in analytics applications.

This type of metadata management tool is in high demand as businesses struggle to inventory all the data they collect, as well as to comply with data privacy rules, such as the European Union's General Data Protection Regulation.

Analyst firm Gartner recommends the use of data catalog software to curate inventories of available data assets and to map information supply chains. These tools are an essential component of corporate data management strategies, according to the firm.

How data catalog software works

Sharon Graves, enterprise data evangelist and Tableau Server administrator at web hosting giant GoDaddy, implemented data catalog software from Alation Inc. in 2015 to reduce the time analytics users spend searching for the right data and to ensure the data they access has been vetted by data stewards.

"There is a problem where we have users who don't know anything about which data source to use or where to find the data. We needed to point users to a tool," she said. "We wanted our analysts to be spending their time doing analysis, and we wanted to support end users doing simple charting and crosstabs."

Data catalog software features checklist
Ten data catalog features to look for from a software vendor

The data catalog pulls in metadata from various locations -- Hadoop, Amazon Redshift, Apache Hive, Tableau Server, Teradata and other sources -- and gathers it all in a portal where users can search for relevant data. It sorts the data based on a number of factors, including whether the data steward has endorsed the data for use in certain applications, and by the popularity of the data – which can be finagled by data experts to ensure the right data surfaces first, Graves said. Data teams can also build unified or packaged data sets that take care of data joins for users, she added.

Traditional metadata management capabilities are at the core of data catalog software, including business glossaries, data lineage and impact analysis, along with modern features, such as self-generating topic extraction, taxonomy generation, semantic discovery, machine-learning pattern mapping and knowledge graphing, according to Gartner. All in all, data catalogs enable companies to get the most value out of the data that sits in data lakes by making it easy to find and apply in business analysis.

In addition to Alation, other vendors offer data catalog software either as part of their metadata management tools or as stand-alone offerings, including Attivio, Cambridge Semantics, Collibra, Informatica, Microsoft, Oracle, SAP and Waterline Data.

Dig Deeper on Data quality management software

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

What data catalog features do you value most?