Employees who rely on self-service business analytics tools to make data-driven business decisions need access to a lot of data, but they can't be allowed to just pull raw data out of a data lake or other big data repositories; the data they use must be curated to ensure it is accurate and appropriate. That's where data catalog software comes in.
A data catalog is a type of metadata management system that is user-friendly enough for the average business user. Data catalogs are used to build portals in which users can find data that has been curated by data stewards or other data professionals. They classify the data in terms that business users understand and provide context around the data so it can be used in analytics applications.
This type of metadata management tool is in high demand as businesses struggle to inventory all the data they collect, as well as to comply with data privacy rules, such as the European Union's General Data Protection Regulation.
Analyst firm Gartner recommends the use of data catalog software to curate inventories of available data assets and to map information supply chains. These tools are an essential component of corporate data management strategies, according to the firm.
How data catalog software works
Sharon Graves, enterprise data evangelist and Tableau Server administrator at web hosting giant GoDaddy, implemented data catalog software from Alation Inc. in 2015 to reduce the time analytics users spend searching for the right data and to ensure the data they access has been vetted by data stewards.
"There is a problem where we have users who don't know anything about which data source to use or where to find the data. We needed to point users to a tool," she said. "We wanted our analysts to be spending their time doing analysis, and we wanted to support end users doing simple charting and crosstabs."
The data catalog pulls in metadata from various locations -- Hadoop, Amazon Redshift, Apache Hive, Tableau Server, Teradata and other sources -- and gathers it all in a portal where users can search for relevant data. It sorts the data based on a number of factors, including whether the data steward has endorsed the data for use in certain applications, and by the popularity of the data – which can be finagled by data experts to ensure the right data surfaces first, Graves said. Data teams can also build unified or packaged data sets that take care of data joins for users, she added.
Traditional metadata management capabilities are at the core of data catalog software, including business glossaries, data lineage and impact analysis, along with modern features, such as self-generating topic extraction, taxonomy generation, semantic discovery, machine-learning pattern mapping and knowledge graphing, according to Gartner. All in all, data catalogs enable companies to get the most value out of the data that sits in data lakes by making it easy to find and apply in business analysis.
In addition to Alation, other vendors offer data catalog software either as part of their metadata management tools or as stand-alone offerings, including Attivio, Cambridge Semantics, Collibra, Informatica, Microsoft, Oracle, SAP and Waterline Data.
Dig Deeper on Data quality management software
Related Q&A from Bridget Botelho
See how augmented analytics compares to traditional BI and self-service analytics tools and what this new generation of AI-powered data analysis ... Continue Reading
What's the difference between DDL and DML? Get the answer and see examples of data manipulation language and data definition language commands for ... Continue Reading
The definition of personal data in the EU's GDPR data protection rules is broad enough to include any type of data that can be used to directly or ... Continue Reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.