Data quality management: Problems and horror stories

Data quality management is not easy, as is evidenced by these true horror stories of data quality gone wrong. Learn from (and wince at) these mistakes to avoid problems with your own data quality projects.

The following excerpt from Enterprise Knowledge Management: The Data Quality Approach, by David Loshin, is printed...

with permission from Morgan Kaufmann, a division of Elsevier. Copyright 2001.Click here to read the complete Chapter 1: Data quality management: Problems and horror stories.

Data quality management is not easy, as is evidenced by these true horror stories of data quality gone wrong. Learn from (and wince at) these mistakes to avoid problems with your own data quality projects.

Without even realizing it, everyone is affected by poor data quality. Some are affected directly in annoying ways, such as receiving two or three identical mailings from the same sales organization in the same week. Some are affected in less direct ways, such as the 20-minute wait on hold for a customer service department. Some are affected more malevolently through deliberate fraud, such as identity theft. But whenever poor data quality, inconsistencies, and errors bloat both companies and government agencies and hamper their ability to provide the best possible service, everyone suffers.

Data quality seems to be a hazy concept, but the lack of data quality severely hampers the ability of organizations to effectively accumulate and manage enterprise-wide knowledge. The goal of this book is to demonstrate that data quality is not an esoteric notion but something that can be quantified, measured, and improved, all with a strict focus on return on investment. Our approach is that knowledge management is a pillar that must stand securely on a pedestal of data quality, and by the end of this book, the reader should be able to build that pedestal.

This book covers these areas.

  • Data ownership paradigms
  • The definition of data quality
  • An economic framework for data quality, including steps in building a return on investment model to justify the costs of a data quality program
  • The dimensions of data quality
  • Using statistical process control as a tool for measurement
  • Data domains and mappings between those domains
  • Data quality rules and business rules
  • Measurement and current state assessment
  • Data quality requirements analysis
  • Metadata and policy
  • Rules-based processing
  • Discovery of metadata and data quality and business rules
  • Data cleansing
  • Root cause analysis and supplier management
  • Data enhancement
  • Putting it all into practice

The end of the book summarizes the processes discussed and the steps to building a data quality practice.

Before we dive into the technical components, however, it is worthwhile to spend some time looking at some real-world examples for motivation. In the next section, you will see some examples of "data quality horror stories" — tales of adverse effects of poor data quality.

Data quality horror stories

Bank Deposit?

In November of 1998, it was reported by the Associated Press that a New York man allegedly brought a dead deer into a bank in Stamford, Connecticut, because he was upset with the bank's service. Police say the 70-year-old argued with a teller over a clerical mistake with his checking account. Because he was apparently unhappy with the teller, he went home, got the deer carcass and brought it back to the branch office.

CD Mail Fraud

Here is a news story taken from the Associated Press newswire. The text is printed with permission.

Newark — For four years a Middlesex County man fooled the computer fraud programs at two music-by-mail clubs, using 1,630 aliases to buy music CDs at rates offered only to first-time buyers.

David Russo, 33, of Sayerville, NJ, admitted yesterday that he received 22,260 CDs by making each address — even if it listed the same post office box — different enough to evade fraud-detection computer programs.

Among his methods: adding fictitious apartment numbers, unneeded direction abbreviations and extra punctuation marks. (Emphasis mine)

The scam is believed to be the largest of its kind in the nation, said Assistant U.S. Attorney Scott S. Christie, who prosecuted the case.

The introductory offers typically provided nine free CDs with the purchase of one CD at the regular price, plus shipping and handling. Other CDs then had to be purchased later to fulfill club requirements. Russo paid about $56,000 for CDs, said Paul B. Brickfield, his lawyer, or an average of $2.50 each. He then sold the CDs at flea markets for about $10 each, Brickfield said. Russo pleaded guilty to a single count of mail fraud. He faces about 12 to 18 months in prison and a fine of up to $250,000.

Mars Orbiter

The Mars Climate Orbiter, a key part of NASA's program to explore the planet Mars, vanished in September 1999 after rockets were fired to bring it into orbit of the planet. It was later discovered by an investigative board that NASA engineers failed to convert English measures of rocket thrusts to newtons, a metric system measuring rocket force, and that was the root cause of the loss of the spacecraft. The orbiter smashed into the planet instead of reaching a safe orbit.

This discrepancy between the two measures, which was relatively small, caused the orbiter to approach Mars at too low an altitude. The result was the loss of a $125 million spacecraft and a significant setback in NASA's ability to explore Mars.

Credit Card Woes

After having been a loyal credit card customer for a number of years, I had mistakenly missed a payment when the bill was lost during the move to our new house. I called the customer service department and explained the omission, and they were happy to remove the service charge, provided that I sent in my payment right away, which I did.

A few months later, I received a letter indicating that "immediate action" was required. Evidently, I had a balance due of $0.00, and because of that, the company had decided to revoke my charging privileges! Not only that, I was being reported to credit agencies as being delinquent.

Needless to say, this was ridiculous, and after some intense conversations with a number of people in the customer service department, they agreed to mark my account as being paid in full. They notified the credit reporting agencies that I was not, and never had been, delinquent on the account (see Figure 1.1).

Figure 1.1: Mysterious bill

Open or Closed Account?

Three months after canceling my cellular telephone service, I continue to receive bills from my former service provider indicating that I was being billed for $0.00 — "Do not remit."

Business Credit Card

A friend of mine is the president of a small home-based business. He received an offer from a major charge card company for a corporate charge card with no annual fee. He accepted, and a short time later, he received his card in the mail. Not long after that, he began to receive the same offer from the same company, but those offers were addressed differently. Evidently, his name had been misspelled on one of his magazine subscriptions, and that version had been submitted to the credit card company as a different individual. Not only that, his wife started to receive offers too.

Six months later, this man still gets four or five mail offers per week in the mail from the same company, which evidently not only cannot figure out who he is but also can't recognize that he is already a customer!

Direct Marketing

One would imagine that if any business might have the issue of data quality on top of its list, it would be the direct marketing industry. Yet, I recently received two identical pieces of mail the same day from the local chapter of an association for the direct marketing industry. One was addressed this way:

David Loshin
123 Main Street
Anytown, NY 11787

Dear David, ...

The other was addressed like this:

Loshin David
123 Main Street
Anytown, NY 11787

Dear Loshin, ...

Tracking Backward

I recently ordered some computer equipment, and I was given a tracking number to follow the package's progress from the source to my house. If you look at the example in Table 1.1 (which has been slightly modified from the original), you will see that the package was scanned at the exit hub location in a specific state on June 26, was (evidently) scanned in Nassau county, NY, at 12:30 A.M. the following day but was scanned as a departure from the airport in the same state as the exit hub at 1:43 P.M., which is practically 11 hours later. The rest of the tracking makes sense — from the XX airport to an airport local to my area, then onto my locality, and finally to the delivery point.

Obviously, the June 27, 12:30 A.M. scan in Nassau has either the incorrect location or the incorrect time. It is most likely the incorrect time, since packages are scanned on entry to a location and on exit, and this scan appears between the location scan at EXIT HUB and the departure scan at ANYTOWN INTL, same state.

Table 1.1: Tracking history for the equipment I ordered

Date Time Location Activity  
June 28, 2000 5:25 P.M.

3:42 P.M.
3:31 P.M.
Nassau, NY
Nassau, NY US

Location SCAN
June 27, 2000 11:21 P.M.
4:45 P.M.
1:43 P.M.
12:30 A.M.
Newark Intl, NJ
Newark Intl, NJ
Anytown Intl, XX
Nassau, NY US
Departure scan
Departure scan
June 26, 2000 11:29 A.M. Exit hub, XX US Location scan
June 23, 2000 9:11 P.M.
1:38 P.M.
Addison, IL US Location scan
Shipment data


These are just a few stories culled from personal experience, interactions with colleagues, or reading the newspaper. Yet, who has not been subject to some kind of annoyance that can be traced to a data quality problem?

Dig Deeper on Data quality techniques and best practices