Data dredging, sometimes referred to as "data fishing" is a data mining practice in which large volumes of data are analyzed seeking any possible relationships between data. The traditional scientific method, in contrast, begins with a hypothesis and follows with an examination of the data. Sometimes conducted for unethical purposes, data dredging often circumvents traditional data mining techniques and may lead to premature conclusions. Data dredging is sometimes described as "seeking more information from a data set than it actually contains."
Data dredging sometimes results in relationships between variables announced as significant when, in fact, the data require more study before such an association can legitimately be determined. Many variables may be related through chance alone; others may be related through some unknown factor. To make a valid assessment of the relationship between any two variables, further study is required in which isolated variables are contrasted with a control group. Data dredging is sometimes used to present an unexamined concurrence of variables as if they led to a valid conclusion, prior to any such study.
Although data dredging is often used improperly, it can be a useful means of finding surprising relationships that might not otherwise have been discovered. However, because the concurrence of variables does not constitute information about their relationship (which could, after all, be merely coincidental), further analysis is required to yield any useful conclusions.