Definition

data profiling

Data profiling, also called data archeology, is the statistical analysis and assessment of the quality of data values within a data set for consistency, uniqueness and logic.  

The insight gained by data profiling can be used to determine how difficult it will be to use existing data for other purposes.  It can also be used to provide metrics to assess data quality and determine whether or not metadata accurately describes the actual values in the source data. The data profiling process cannot identify inaccurate data; it can only identify  business rules violations and anomalies.

Profiling tools evaluate the actual content, structure and quality of the data by exploring relationships that exist between value collections both within and across data sets. For example, by examining the frequency distribution of different values for each column in a table, an analyst can gain insight into the type and use of each column. Cross-column analysis can be used to expose embedded value dependencies and inter-table analysis allows the analyst to discover overlapping value sets that represent foreign key relationships between entities.  

See also:data modeling, data dictionary, data deduplication 

 

 

Related glossary terms: data analytics (DA)
This was last updated in February 2011
Posted by: Margaret Rouse

Email Alerts

Register now to receive SearchDataManagement.com-related news, tips and more, delivered to your inbox.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

More News and Tutorials

Do you have something to add to this definition? Let us know.

Send your comments to techterms@whatis.com