Data quality management tips and best practices

Get data quality management tips and best practices with expert advice and book excerpts. Learn about academic sources for data quality, key steps and common mistakes of data quality implementations, how much you should spend for data cleansing and more.

This Content Component encountered an error
This Content Component encountered an error

eBook: Tactical data quality: How to improve DQ with a tight budget This section highlights valuable data quality expert advice and resources from SearchDataManagement.com. Get data quality management tips and best practices from Q/As and book excerpts. Learn about academic sources for data quality, key steps and common mistakes of data quality implementations, how much companies should spend for data cleansing and more.


Don't miss the other installments in this data quality management guide
 Managing data quality efforts during a recession
Trends in the data quality market
Avoiding data quality pitfalls and using data quality tools for discovering new opportunities
Q/A: Identifying data quality problems with a data quality assessment
FAQ: Best practices/tips for data quality



David Loshin

David Loshin, President, Knowledge Integrity, Inc.

David Loshin is the president of Knowledge Integrity, Inc, a consulting company focusing on customized information management solutions including information quality consulting and training, business intelligence, metadata and data standards management. David is an industry thought-leader and one of Knowledge Integrity's most recognized experts in information management. He writes for many industry publications, creates and teaches courses for The Data Warehousing Institute and other venues, and regularly presents at the annual DAMA/Meta Data conference. David is the author of "Enterprise Knowledge Management - The Data Quality Approach," which describes a revolutionary strategy for defining, managing, and implementing business rules affecting data quality management. His other book "Business Intelligence: The Savvy Manager's Guide has been hailed as a leading BI resource. He can be reached via his website, knowledge-integrity.com.

Evan Levy

Evan Levy, Partner and Co-Founder, Baseline Consulting

Evan Levy is a partner and co-founder of Baseline Consulting, a technology and management consulting firm specializing in business analytics and data integration. Evan is actively involved in guiding projects and delivering solutions to Baseline's Fortune 1000 family of clients and has delivered data integration strategies at companies like Microsoft, American Express, and Verizon. Considered an industry leader on the topic of data integration and management, Evan advises vendors and VC firms on new and emerging product strategies. He is a faculty member of The Data Warehousing Institute and a contributor to several leading industry publications and portals. Evan also writes the Inside IT blog on Baseline's website. He is co-author of the book Customer Data Integration: Reaching a Single Version of the Truth which was the first book published on the topic of master data management.

Craig Mullins

Craig Mullins, Former SearchDataManagement.com expert

Craig S. Mullins is a data management strategist, researcher, and consultant who answered Ask the Expert questions for SearchDataManagement.com from 2005-2006. He is currently VP of Data Strategy and corporate technologist at NEON Enterprise Software in Sugar Land, Texas.Craig has more than two decades of experience in all facets of database systems development including developing and teaching DB2 and SQL Server classes, systems analysis and design, database administration and system administrator, and data analysis and modeling. He has worked with DB2 for z/OS and OS/390 since Version 1 and has experience working with Microsoft SQL Server, Sybase and IMS. Craig has worked in multiple industries including manufacturing, banking, commercial software development, education, research, utilities, and consulting. Craig was appointed to be an IBM Data Champion in 2009.


Where to find new academic resources on data quality best practices

Question: What are the best practices in managing data quality, and where can I find a good source of new academic resources in this area?

David Loshin
SearchDataManagement.com expert
President, Knowledge Integrity, Inc.

A: I have a few suggestions. From the academic perspective, I'd check out the MIT Information Quality group and the University of Arkansas Information Quality program, along with the website for the International Association for Information and Data Quality. DAMA has recently released its DAMA DMBOK (Data Management Book of Knowledge), which contains a chapter on data quality.

Most of the larger data quality tool vendors share a significant amount of best practices collateral. Also, check out my comprehensive book on Data Quality, Enterprise Knowledge Management – The Data Quality Approach.

 


Question: What do you think are the most common mistakes that companies make when implementing data quality management programs? We are about to begin an enterprise-wide data quality management initiative, and I'm wondering whether there are any common pitfalls that we can avoid.

David Loshin
SearchDataManagement.com expert
President, Knowledge Integrity, Inc.

A: Data quality management is not easy, owing to the size and complexity of organizations. There are several common mistakes that companies make during a data quality program implementation. Here are three specific pitfalls that can turn a data quality management project into a nightmare:

Data quality management mistake No. 1: Expecting the silver bullet

Some organizations think they can buy a packaged solution that will address all data quality issues and immediately make them disappear. This unrealistic hope for a "magic tool" is evidenced by how often people acquire a data quality tool as the first step in setting up their data quality program. Buying software before developing a program is indicative of a reactive environment -- and the misguided thought that data quality is a technology-driven solution. Too often, senior management gets the idea that you can "fix" noncompliant data instead of eliminating the introduction of bad data in the first place.

How often has your organization bought a tool, only to have it still sitting on the shelf in its shrink-wrap months later? Although data quality tools are critical components of a data quality program, one must first question the motivation for purchasing a tool, then the process itself, and consider the improvement potential in terms of contributing to the effectiveness of the program.

Data quality management mistake No. 2: Not having the right expertise

There is often an expectation that as soon as a data quality program is initiated within an organization, there should be some visible improvement in the data. This is not so. Developing a data quality management program is a strategic undertaking. Its success depends on having both business and technical expertise. This is complicated by the fact that a large part of data quality management, especially at the enterprise level, is advisory.

The close coupling of tools and methods introduces additional complexity to the process. Too often, the data quality manager is viewed as having responsibility for some data quality improvement action without necessarily having either the knowledge or authority to make it happen. The result is an overwhelming feeling that the size of the problem makes its solution unreachable -- and consequently the team has no idea where to begin. The mistake occurs in not bringing in the proper expertise to help get the program off the ground.

Data quality management mistake No. 3: Not accounting for organizational culture changes

Even while attempting to improve the quality of data, we often forget that we must work within an organization's existing culture to achieve our improvement goals. No technology in the world will eliminate data quality problems if there is no understanding of how people's behavior allows the introduction of information flaws in the first place.

The evolution of centralized analytical data warehouses provides a good example. Data sets from numerous source systems are extracted, aggregated and transformed in preparation for loading into the warehouse. The data quality problems emerge when the data sets are merged together (perhaps customer names or account numbers are stored in slightly variant forms, data types might not match on similar columns, values are missing, or fields are incomplete). But without the cooperation of upstream systems owners, data warehousing managers are often helpless to control the quality of incoming data. Stricter data quality needs at the data warehouse demand resource allocation by upstream managers. The problem is that their applications may not directly benefit from the desired improvements -- and this acts as an effective disincentive for upstream managers to cooperate.

How to avoid data quality management mistakes

Don't despair, though -- knowing the common pitfalls of data quality management programs can help you avoid them. Here are some guidelines to keep in mind:

  • Exploit the advisory role of data quality teams and use internal procedures to attach responsibility and accountability for data quality improvement to the existing information management authority.
  • Don't forget training in the use of policies and procedures -- especially in the use of acquired tools.
  • Hire professionals with experience in managing data quality projects and programs from the start. These individuals will be able to identify opportunities for tactical successes that together contribute to the strategic success of the program.
  • Engage external experts to help jump-start the improvement process. This will reassure your team that your problems are not unique and will allow you to learn from others' best practices.

 

Data quality management tools:  Where to get unbiased information

Question: Where can I find a detailed, free report about the data quality management tools market and a comparative analysis report of various data quality management vendors?

David Loshin
SearchDataManagement.com expert
President, Knowledge Integrity, Inc.

A: If you're looking for unbiased information on data quality management tools, my best recommendation is that you seek out the content available via industry publications and websites, such as The Data Warehousing Institute. Also, certain vendors may make portions of subscription-based analysis (e.g., Gartner Magic Quadrants) available from their websites. For a more comprehensive acquisition effort, I strongly recommend engaging a firm with expertise in the data quality management field to marshal your organization through the requirements analysis/specification, demonstration/proof-of-concept, and assessment phases of the procurement.

 

How to estimate customer data cleansing costs

Questions: How much on average does it cost to clean one customer record? How much should an organization spend on customer data cleansing? Do you know whether there has been any specific reporting or analysis done on this area of customer data quality?

David Loshin
SearchDataManagement.com expert
President, Knowledge Integrity, Inc.

A: The challenge with this question is that underlying its simplicity lie many latent questions whose answers are needed before any kind of customer data cleansing costs analysis can be considered. For example, what data elements constitute the customer record? How many records are there? What are the criteria for declaring a record "clean"? What types of customer data are there? Individuals or organizations? How old are the records? Are they in a single table or scattered across many data assets? What approaches are to be taken for cleansing? There may be studies performed by vendors on the average cost, but I suspect that beneath this question lurk other, more important ones.

To start thinking about the cost of cleansing, consider this example, with residential customer data consisting of first name, last name and telephone number. One can determine whether a single record is "correct" using this algorithm: Call the telephone number, ask to speak with the person whose name shares the record with the telephone number. If the person comes to the phone, ask whether all the values are accurate, and correct those that are not. If there is no one there by that name, the record is incorrect; but, at this point, what can be done to correct it? Either the name is not correct or the number is not correct. The next step in cleaning requires additional information, and if none is available, then the algorithm ends.

Simplistic? Yes. Accurate? Yes. Cost effective? Depends on the number of records, staff members and telephones. Scalable? Not really. There are alternatives, but reliance on different approaches starts to affect those key considerations. Automated solutions may be more scalable, more costly, less accurate, more complex, require more expertise, and so on.

It may be better to challenge the question, then, and turn it into a different sort of beast by suggesting that we answer these questions first and then look at the different alternatives and their corresponding costs:

  • What business processes are affected by "unclean" customer data?
  • How is "clean" customer data defined?
  • What business benefits can be achieved by cleaning customer data?
  • What level of precision is necessary for those benefits to be achieved?

The level of effort that is reasonable to spend on customer data cleansing must be less than the value of the accrued business benefits, and this provides an upper limit to what could be budgeted for the process.

 

Data quality management:  Building the business case

Question: When trying to build a business case for a data quality management initiative, what should I focus on in order to justify the investment?

Craig Mullins
Former SearchDataManagement.com expert

A: The first step to building your data quality management business case is to try to quantify the cost of poor-quality data to the business. This needs to be written in business language and not technology-speak. In other words, what is the cost of a lost sale because product information was incorrect? Do you have a way to identify these cases? Even anecdotal evidence can be powerful if you are talking to the manager of the product line that lost the sale.

I realize that the cost of finding these problems can be enormous, too. It can help to have some industry expert help. I would recommend that you purchase and read any of the several excellent books that Thomas C. Redman has written. These books focus on the data quality problems and have some facts and figures on average cost of poor quality data to business. For more in-depth and technical treatments of data quality management issues, I would direct you to books written by Jack Olson and Larry English.

 

Data quality management begins with data governance

Question: I'm beginning a data quality management initiative. What are the most important steps one needs to take in order to ensure the most flawless data quality? Also, which is more important, the application or the people?

Craig Mullins
Former SearchDataManagement.com expert

A: Well, that is a short question that would require a book-length answer to do it justice. Instead of inundating you with information, let me give you a few high-level pieces of advice to get your data quality management initiative moving on the right track. First of all, you need to be sure that the executives at your company recognize the data quality problem and endorse the need to rectify it. By this I mean a couple of things. Data (perhaps, more accurately, information) needs to be treated as a valuable corporate asset. This means imbuing data with the same value as your other corporate assets – and then treating it accordingly. What are your other corporate assets? Capital, human resources, intellectual property, office buildings and equipment, and so on. You protect, manage, inventory and even model all of these assets (what is an org chart but a model of your human resources?). Executives do not need to be told to manage these assets, but perhaps they do when it comes to data. Have you defined and inventoried all of the critical data elements needed by your organization? Does your company know where every piece of data is? And yes, I am talking about copied data – even in Excel spreadsheets on your users' desktops.

Only when you know what it is you are dealing with can you ever hope to ensure that it is accurate. With that in mind, how is data governance implemented (if at all) in your organization? Data governance encompasses the people, processes and procedures to create a consistent enterprise view of a company's data in order to increase consistency and confidence in decision making, decrease the risk of regulatory fines, and improve data security. Consider these questions:

  • Does your company have a team of IT professionals focused on data governance?
  • Or do you just have the DBA group, with anything even remotely relating to data getting foisted upon them?
  • Is IT aligned with business so that each data element gets the proper treatment it requires for the business as well as in terms of governmental regulations?
  • Or do you hobble along with IT and business interacting only when necessary to gather program and database specifics?

If you are hobbling, consider working to build a data governance practice before you home in to clear up all of your data quality problems. Look for a good consultant or two to come in and analyze your organization and give you advice on what you need to do to initiate data governance with an eye to treating data as a corporate asset.

Of course, you can always take some baby steps along the way and do not have to implement a grand data governance practice before doing anything. Procuring and implementing a data profiling tool can help to show you the existing state of your data – and perhaps help you start cleansing it.

Good luck!

 

Effective data quality program management: Tips and advice

Question: Starting off at a high level, what makes a data quality project or program successful? Are there ways a program should or shouldn't be structured to be successful?

Evan Levy
SearchDataManagement.com expert
Partner, Co-Founder, Baseline Consulting

A: One of the biggest challenges when it comes to dealing with data quality is making sure people focus on the fact that data quality is not about data perfection. One of the biggest challenges I find is that people are so focused on "How do I make it better? How do I support my business application users?" They get very wound up on trying to do what amounts to splitting hairs rather than saying, "Wait a second. If the data is perceived to be bad, what is it that we can do to make it better to support what the business is trying to accomplish?"

So, to answer the question "What makes a data quality project or program successful?" it's about focusing on what we are trying to fix and whether we have bad data differentiating what error detection is…

…One good example many people like to use is the address. I can actually determine from someone's street address the city and state they're in if the zip code is missing. With data quality tools and data quality techniques, I can actually distill, interpret or calculate what the zip code field is. However, there are circumstances where because data is incomplete I can't correct it or make it better. So, to sum up the question of how you make the program or project successful, you can't sign up to nirvana, you can't sign up to data perfection because you have to differentiate identifying the error before you can focus on correcting it.

I'd say the other thing to keep in mind when you're dealing with a data quality program is to make absolutely sure you've got someone who's a stakeholder. Whether it's a business user or an application individual, they can give you the guidance and say, "Here's what I need to get out of data quality." There are probably four or five key aspects to data quality that one needs to understand. If I'm determining what the error is, I need to know and agree on the meaning, how I represent it and what the definition of accuracy is. That seems very simple. For example, if I want to define what the color red is [for] my database … say "Red is a reasonable value for color but I need to make sure that we all agree that that's an accurate value that it's representative in a consistent fashion — R, red or, in fact, the three RGB numeric value of 195 49 28." But one of the benefits of having a business user in place or that applications person is to establish … what the accurate values might be and whether, in fact, there's enough information to determine what the error value is [ and how to correct it].

 

Thirteen causes of enterprise data quality problems

Thirteen causes of enterprise data quality problems
Excerpt from Data Quality Assessment by Arkady Maydanchik. Reprinted with permission from Technics Publications, LLC; copyright 2007.

Data is affected by numerous processes, most of which have an impact on its quality to a certain degree. I had to deal with data quality problems on a daily basis for many years and have seen every imaginable scenario of how data quality deteriorates. Each situation is different, but I eventually came up with a classification shown in Figure 1-1. It shows 13 categories of processes that cause the data problems, grouped into three high-level categories.

Figure 1-1: Processes affecting data quality
Example: Processes affecting data quality

In this chapter we will systematically discuss the 13 processes presented in Figure 1-1 and explain how and why they negatively affect data quality.

Cause No. 1 of enterprise data quality problems: Initial data conversion

Databases rarely begin their life empty. More often, the starting point in their lifecycle is a data conversion from some previously exiting data source. And by a cruel twist of fate, it is usually a rather violent beginning. Data conversion usually takes the better half of new system implementation effort and almost never goes smoothly.

Cause No. 2 of enterprise data quality problems: System consolidations

Database consolidations are the most common occurrence in the information technology landscape. They take place regularly when old systems are phased out or combined. And, of course, they always follow company mergers and acquisitions. Database consolidations after corporate mergers are especially troublesome because they are usually unplanned, must be completed in an unreasonably tight time frame, take place in the midst of the cultural clash of IT departments, and are accompanied by inevitable loss of expertise when key people leave midway through the project.

Cause No. 3 of enterprise data quality problems: Manual data entry

Despite high automation, much data is (and will always be!) typed into the databases by people through various forms and interfaces. The most common source of data inaccuracy is that the person manually entering the data just makes a mistake. To err, after all, is human! People mistype; they choose a wrong entry from the list or enter the right data value in the wrong box. I had, at one time, participated in a data-cleansing project where the analysts were supposed to carefully check the corrections before entering them – and still 3% of the corrections were entered incorrectly. This was in a project where data quality was the primary objective!

Cause No. 4 of enterprise data quality problems: Batch feeds

Batch feeds are large, regular data exchange interfaces between systems. The ever more numerous databases in the corporate universe communicate through complex spider webs of batch feeds.

Cause No. 5 of enterprise data quality problems: Real-time interfaces More and more data is exchanged between the systems through real-time (or near real-time) interfaces. As soon as the data enters one database, it triggers procedures necessary to send transactions to other downstream databases. The advantage is immediate propagation of data to all relevant databases. Data is less likely to be out of sync. You can close your eyes and imagine the millions of little data pieces flying from database to database across vast distances with lightning speed, making our lives easier. You see the triumph of the information age! I see Wile E. Coyote in his endless pursuit of the Road Runner. Going! Going! Gosh!

Cause No. 6 of enterprise data quality problems: Data processing

Data processing is at the heart of all operational systems. It comes in many shapes and forms – from regular transactions triggered by users to end-of-the-year massive calculations and adjustments. In theory, these are repetitive processes that should go "like clockwork." In practice, there is nothing steady in the world of computer software. Programs and underlying data change and evolve, with the result that one morning the proverbial sun rises in the West, or worse yet, does not rise at all.

Cause No. 7 of enterprise data quality problems: Data cleansing

The data quality topic has caught on in recent years, and more and more companies are attempting to cleanse their data. In the old days, cleansing was done manually and was rather safe. New methodologies have arrived that use automated data cleansing rules to make corrections en masse. These methods are of great value and I, myself, am an ardent promoter of the rule-driven approach to automated data cleansing. Unfortunately, the risks and complexities of automated data cleansing are rarely well understood.

Cause No. 8 of enterprise data quality problems: Data purging

Old data is routinely purged from systems to make way for more data. This is normal when a retention limit is satisfied and old data no longer necessary. However, data purging is highly risky for data quality.

Cause No. 9 of enterprise data quality problems: Changes not captured

Data can become obsolete (and thus incorrect) simply because the object it describes has changed. If a caterpillar has turned into a butterfly but is still listed as a caterpillar on the finch's menu, the bird has a right to complain about poor data quality.

Cause No. 10 of enterprise data quality problems: System upgrades

Most commercial systems get upgraded every few years. Homegrown software is often upgraded several times a year. While upgrades are not nearly as invasive and painful as system conversions and consolidations, they still often somehow introduce data problems. How can a well tested, better version negatively affect data quality?

Cause No. 11 of enterprise data quality problems: New data uses

Remember that data quality is defined as "fitness to the purpose of use." The data may be good enough for one purpose but inadequate for another. Therefore, new data uses often bring about changes in the perceived level of data quality even though the underlying data is the same. For instance, HR systems may not care too much to differentiate medical and personal leave of absence – a medical leave coded as a personal leave is not an error for most HR purposes. But start using it to determine eligibility for employee benefits, and such minute details become important. Now, a medical leave entered as a personal leave is plain wrong.

Cause No. 12 of enterprise data quality problems: Loss of expertise

In almost every data quality project on which I worked, there is a Dick or Jane or Nancy whose data expertise is unparalleled. Dick was with the department for the last 35 years and is the only person who really understands why for some employees date of hire is stored in the date of birth field, while for others it must be adjusted by exactly 17 days. Jane still remembers times when she did calculations by hand and entered the results into the system that was shut down in 1985, even though she still sometimes accesses the old data when in doubt. When Nancy decided to retire, she was offered hourly work from home at double her salary. Those are true stories.

Cause No. 13 of enterprise data quality problems: Process automation

With the progress of information technology, more and more tasks are automated. It starts from replacement of data entry forms with system interfaces and extends to every layer of our life. Computer programs process and ship orders, calculate insurance premiums, and even send spam – all with no need for human intervention. Where in the past a pair (or several pairs) of human eyes with the full power of trained intellect protected the unsuspecting customers, now we are fully exposed to a computer's ability to do things that are wrong and not even feel sorry.

 

Data quality assurance

Data quality assurance
The following is an excerpt from Data Quality: The Accuracy Dimension by Jack E. Olson. Reprinted with permission from Morgan Kaufmann, a division of Elsevier. Copyright 2003.

Goals of a data quality assurance program

A data quality assurance program is an explicit combination of organization, methodologies and activities that exist for the purpose of reaching and maintaining high levels of data quality. The term assurance puts it in the same category as other functions corporations are used to funding and maintaining. Quality assurance, quality control, inspection, and audit are terms applied to other activities that exist for the purpose of maintaining some aspect of the corporation's activities or products at a high level of excellence. Data quality assurance should take place alongside these others, with the same expectations.

Just as we demand high quality in our manufactured products, financial reports, information systems infrastructure, and other aspects of our business, we should demand it from our data.

The goal of a data quality assurance program is to reach and maintain high levels of data accuracy within the critical data stores of the corporation. It must encompass all existing, important databases and, crucially, be a part of every project that creates new data stores or that migrates, replicates or integrates existing data stores. It must address not only the accuracy of data when initially collected but accuracy decay, accurate access and transformation of that data, and accurate interpretation of the data for users. Its mission is threefold: improve, prevent, monitor.

Improvement assumes that the current state of data quality is not where you want it to be. Much of the work is to investigate current databases and information processes to find and fix existing problems. This effort alone can take several years for a corporation that has not been investing in data quality assurance.

Prevention means that the group should help development and user departments in building data checkers, better data capture processes, better screen designs, and better policies to prevent data quality problems from being introduced into information systems. The data quality assurance team should engage with projects that build new systems, merge systems, extract data from new applications, and build integration transaction systems over older systems to ensure that good data is not turned into bad data and that the best practices available are used in designing human interfaces.

Monitoring means that changes brought about through data quality assurance activities need to be monitored to determine whether they are effective. Monitoring also includes periodic auditing of databases to ensure that new problems are not appearing.

Structure of a data quality assurance program

Creating a data quality assurance program and determining how resources are to be applied needs to be done with careful thought. The first decision is how to organize the group. The activities of the group need to be spelled out. Properly skilled staff members must be assigned. They then need to be equipped with adequate tools and training.

Data Quality Assurance Department

There should be a data quality assurance department. This should be organized so that the members are fully dedicated to the task of improving and maintaining higher levels of data quality. It should not have members who are part-time. Staff members assigned to this function need to become experts in the concepts and tools used to identify and correct quality problems. This will make them a unique discipline within the corporation. Figure 4.1 is a relational chart of the components of a data quality assurance group.

Figure 4.1
Relational chart of the components of a data quality assurance group

The group needs to have members who are expert data analysts. Analyzing data is an important function of the group. Schooling in database architecture and analytical techniques is a must to get the maximum value from these activities. It should also have staff members who are experienced business analysts. So much of what we call quality deals with user requirements and business interpretation of data that this side of the data cannot be ignored.

The data quality assurance group needs to work with many other people in the corporation. It needs to interact with all of the data management professionals, such as database administrators, data architects, repository owners, application developers, and system designers. They also need to spend a great deal of time with key members of the user community, such as business analysts, managers of departments, and Web designers. This means that they need to have excellent working relationships with their customers.

One way to achieve a high level of cooperation is to have an advisory group that meets periodically to help establish priorities, schedules and interactions with the various groups. This group should have membership from all of the relevant organizations. It should build and maintain an inventory of quality assurance projects that are worth undertaking, keep this list prioritized, and assign work from it. The advisory group can be very helpful in assessing the impact of quality problems as well as the impact of corrective measures that are subsequently implemented.

Data quality assurance methods

Figure 4.2 shows three components a data quality assurance program can build around. The first component is the quality dimensions that need to be addressed. The second is the methodology for executing activities, and the last is the three ways the group can get involved in activities.

Figure 4.2
Three components a data quality assurance program can build around

The figure highlights the top line of each component to show where a concentration on data accuracy lies. Data accuracy is clearly the most important dimension of quality. The best way to address accuracy is through an inside-out methodology, discussed later in the book. This methodology depends heavily on analysis of data through a process called data profiling. The last part of this book is devoted to explaining data profiling. Improving accuracy can be done through any of the activities shown. However, the one that will return the most benefit is generally the one shown: project services.

Any data quality assurance function needs to address all of the dimensions of quality. The first two, data accuracy and completeness, focus on data stored in corporate databases. The other dimensions focus on the user community and how they interpret and use data.

Figure 4.3
Two examples of methods for addressing data quality

The methods for addressing data quality vary, as shown in Figure 4.3. Both of these methodologies have a goal of identifying data quality issues. An issue is a problem that has surfaced, that is clearly defined, and that either is costing the corporation something valuable (such as money, time or customers) or has the potential to cost the corporation something valuable. Issues are actionable items: They result in activities that change the data quality of one or more databases. Once identified, issues are managed through an issues management process to determine value, remedies, resolution, and monitoring of results. The process of issue management is discussed more fully in the next chapter.

 

Data quality:  Why management should care about bad data?

Data quality: Why management should care about bad data
The following is an excerpt from Data Quality: The Field Guide by Tom Redman. Reprinted with permission from Digital Press. Copyright 2001

Some think that no two words can cause a CEO's eyes to glaze over faster than "data quality" (and this applies to heads of government agencies, leaders of nonprofit organizations, etc.). "Data," aren't they the boring bits and bytes buried in our computer systems? And "quality," isn't that the implication that our people aren't working hard enough?

Besides, people have real work to do, customers to satisfy, production schedules to meet, decisions to make, strategies to map out, a demanding board to satisfy. Who wants to worry about those bits and bytes when no one is complaining?

But CEOs are (or should be) passionately interested in data quality, and for a wide variety of reasons.

First, bad data can earn the CEO and his or her organization a place in the national news -- and who needs that? The bombing of the Chinese Embassy is the most publicized recent example. But it happens more frequently than one might think.

Fortunately, most cases of bad data do not land the organization or its leader on the front page. Unfortunately, poor-quality data seems to be the norm. As CEOs know, the costs of poor-quality data are enormous. Some costs, such as added expense and lost customers, are relatively easy to spot, if the organization looks. We suggest (based on a small number of careful, but proprietary, studies), as a working figure, that these costs are roughly 10% of revenue for a typical organization. To date, no one, in hundreds of discussions, has suggested that this number is "way too high." CEOs naturally want to return these monies to the bottom line.


Don't miss the other installments in this data quality management guide
 Managing data quality efforts during a recession
Trends in the data quality market
Avoiding data quality pitfalls and using data quality tools for discovering new opportunities
Q/A: Identifying data quality problems with a data quality assessment
FAQ: Best practices/tips for data quality


 

This was first published in July 2009

Dig deeper on Data quality techniques and best practices

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSOA

SearchSQLServer

Close