Establishing and maintaining data quality is a constant challenge for IT shops -- and the unbridled growth of data generation is making that task increasingly difficult. In addition, the reliance on data to identify new business insights and facilitate effective decision-making is increasing at a rapid pace. IT shops are contending with more data that is more important to their organization.
A reactive approach to data quality improvement is like firefighting: We respond to data quality issues after they occur. We jump from one crisis to another, and the biggest problems garner the most attention. Without a proactive data quality process in place, the number of issues begins to escalate. A proactive data quality improvement program aims to identify data quality issues before they become a problem.
From a two-person shop to a 20,000-person one, data is an enterprise asset. The only difference should be the size of the team responsible for ensuring data quality across the enterprise. Most data elements don't sit idle. They find their way into multiple data stores. An incorrect data value is like a virus: Once created, it will spread its way into reports, dashboards and other data stores across the organization.
Here are some recommendations to help you build a proactive data quality improvement program. It is by no means all-inclusive but will help you begin thinking about the process of improving data quality. The ultimate goal should be to ensure that enterprise data is accurate and consistent, the same objective of the data governance programs that often incorporate data quality efforts.
This article is part of
Build and foster a data quality mindset
Like all organizational initiatives, creating a data quality improvement mindset begins at the top of the org chart. Getting upper management buy-in is critical. Identify team members responsible for data quality and market the benefits of better data quality to both IT personnel and business users. The idea is to integrate data quality into the organizational fabric.
The right people with the right skills
Data quality specialists are challenging to find and can be expensive. But that doesn't prevent you from "growing your own." Identify personnel who express an interest and provide them with the time and training to learn the science of data quality. You build the program based on budgetary constraints and the human assets you have available.
No money, no tools, no problem
Don't have enough funds to invest in data quality tools, data governance software or master data management (MDM) products? That's certainly a challenge, but not an excuse. I've reviewed several startup data quality programs that were a patchwork of documents, procedures, process libraries and open source products. If the organizational desire is there, you can create a robust, proactive data quality improvement program. There are several open source MDM, data governance and data quality tools available to use, such as Talend, Pimcore and OSDQ.
Data quality begins at creation and acquisition
In order to develop high-quality data sets, you must follow best practices during data creation or the acquisition of data from external sources, including sets of big data. Meet with information consumers to determine how they use the data and identify the business policies that govern it. Then you can develop a standard and build data definition rules to enforce conformity.
Most databases provide a robust set of constraints to enforce data conformity. For non-database platforms that enforce conformity programmatically, investigate other mechanisms to store common code and data quality rules. I also highly recommend that organizations of all sizes evaluate MDM and data governance product suites. Their framework of procedures and tools will become the foundation of your data quality program and help you to more quickly establish and enforce your enterprise-wide single source of truth.
Maintaining data quality
Performing proactive data quality checks and reviews will be a core component of your improvement program. The process should be (1) identify, (2) prioritize, (3) evaluate, (4) correct: Identify the data and the subject matter experts, prioritize the importance of the data to the organization, evaluate the most important data assets and correct inaccurate data values.
When you identify incorrect data values, your goal should be to determine the severity and scope of the impact and identify the root cause of the issue. Then take the necessary steps to fix the incorrect data values and address the root cause that created the problem.
There are numerous data quality products available to assist you in your analysis, including offerings from IBM, Informatica, Information Builders, Oracle, SAP, SAS, Syncsort and Talend. Resources like Gartner's Peer Insights reviews of data quality tools by other users can help you compare all the competing offerings.
It's never too late to build your proactive data quality program. Your organization will benefit from fewer data quality issues and a reduction in firefighting activities.