Definition

data dredging (data fishing)

Data dredging, sometimes referred to as "data fishing" is a data mining practice in which large volumes of data are analyzed seeking any possible relationships between data. The traditional scientific method, in contrast, begins with a hypothesis and follows with an examination of the data. Sometimes conducted for unethical purposes, data dredging often circumvents traditional data mining techniques and may lead to premature conclusions. Data dredging is sometimes described as "seeking more information from a data set than it actually contains."

Data dredging sometimes results in relationships between variables announced as significant when, in fact, the data require more study before such an association can legitimately be determined. Many variables may be related through chance alone; others may be related through some unknown factor. To make a valid assessment of the relationship between any two variables, further study is required in which isolated variables are contrasted with a control group. Data dredging is sometimes used to present an unexamined concurrence of variables as if they led to a valid conclusion, prior to any such study.

Although data dredging is often used improperly, it can be a useful means of finding surprising relationships that might not otherwise have been discovered. However, because the concurrence of variables does not constitute information about their relationship (which could, after all, be merely coincidental), further analysis is required to yield any useful conclusions.

Related glossary terms: decision tree, dark data, box plot
This was last updated in October 2010
Posted by: Margaret Rouse

Email Alerts

Register now to receive SearchDataManagement.com-related news, tips and more, delivered to your inbox.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

More News and Tutorials

  • Big data systems shine light on neglected 'dark data'

    The processing power of Hadoop and other big data tools is making it more feasible for companies to tap into dark data, information that previously was left untouched in IT systems.

  • What ever happened to clickstream data?

    Is there gold in the clickstream creek? Maybe setting the expectations as to what can be mined and the resources it will take for mining are more rational and realistic today than they were in the past.

Do you have something to add to this definition? Let us know.

Send your comments to techterms@whatis.com

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: