This article originally appeared on the BeyeNETWORK
In last month’s article, I discussed the importance of creating a data governance entity to ensure that data quality problems are addressed. A data governance entity delegates authority from the senior-most levels of the firm and holds the appropriate parties accountable for resolving data issues, since quality is one of the more pressing issues faced by financial services institutions today. With a data governance entity in place and quality raised as a top-tier issue, the question now becomes how to fix data quality problems.
The 26 Variables Explosion
Here’s a real situation to consider. A major U.S. bank committed several hundred million dollars to developing and implementing a CRM system. The key to the success of the implementation is quality customer data. Given the bank’s inability to certify the data in its entirety, and because the whole system represents a significant operating risk to the bank, the board of directors directed the bank to ensure the quality of the customer data.
To simplify the process, a team of business owners of the CRM system identified 26 “key performance indicators” (KPIs) (pieces of information) that were “critical” to the system. The thought was that by focusing on a small number of KPIs, the process of ensuring the quality of data would be simplified. The next step was to profile and determine the quality of the KPIs. In the process, the team found that these 26 pieces of information were represented by more than 5,000 specific data fields located across 26 different operating systems. Profiling and rationalizing all of these became a monumental task.
While data quality is very important, it also can be one of the most frustratingly elusive goals one would ever chase—particularly for enterprises with high data volumes or extremely complex data, like many financial service firms. The problem seems overwhelming and intractable.
Let’s discuss three strategies for making data quality initiatives more manageable.
Strategy No. 1—Keep a Context-Specific Mindset
The fundamental principle to remember about data quality is that data has to be suitable for its intended use. This means that data quality is defined differently for each use and each system. Therefore, when undertaking a data quality initiative, it’s important to understand the context of how the data is being used and to understand that user satisfaction is tied directly to data quality. There may, in fact, be defects in the data that do not impact a specific use; for that purpose, the data is fine. However, if a data quality defect directly impacts the use of the data, users will realize that this data does not meet their requirements. Keep this context-specific mindset when dealing with data quality challenges. Don’t fix what isn’t “broken” from an end-user standpoint.
Strategy No. 2—Address Data Quality Issues at an Atomic Level
Once you’ve established the context in which the data is being used, the next thing to understand is the correlation between the sophistication of the techniques that are used to improve data quality and the granularity of the data itself.
Atomic data refers to fine-grained data or data that is at the lowest level—not data that is explosive if mishandled (although some may argue this point). Such atomic data will usually be easier to bring to a higher quality level. As you begin to move into the area of derived data, summarized data or aggregated data, the rules for achieving data quality become more sophisticated and complex.
This happens primarily because derived data has dependencies on more than the one set of atomic data. Derived data is impacted by the data quality issues of multiple sets of data and multiple sets of rules that have been applied. Hence, the data “explosion” experienced by the bank in our example cited above.
Additionally, it’s important to recognize that any change to the quality of atomic data will then impact the quality of any data that is derived from that data. Even some “improvements” to the quality of atomic data may impact, positively or negatively, the quality of data that is derived from it in the process.
Strategy No. 3—Think of Your Data Quality Efforts as a Pyramid
A pyramid provides a strong visual metaphor for thinking about where to start your data quality efforts. You should begin by working on less complex data quality issues, such as data type and domain, gradually building a strong foundation for addressing more complex aspects of data quality, such as business rule conformance. The six levels of data quality sophistication pictured below will allow you to build a solid base for your data quality efforts before moving on to more complex issues.
- Data type and domain: This is where most data quality initiatives start (and unfortunately end). This level lays the foundation for data quality improvement. Begin by looking at specific data types and attribute value domains to ensure conformance. In character fields, frequent problems include non-printable characters, field-length mismatches and character set mismatches. In date/time fields, problems include time zone ambiguities, out-of-range values (especially 00/00/00 and 99/99/99) and sub-element misalignment. Categorical values frequently contain “surprise values” such as a gender code with “M,” “F” and an unexpected “Z” or “?.”
- Completeness: Completeness applies both logically and physically. Logically, temporal inconsistencies are common culprits; for example, average daily balance vs. current balance. Physical completeness is often compromised at interface points between systems, especially external systems.
- Uniquenessand referential integrity: Uniqueness applies within an entity. Referential integrity applies to uniqueness in foreign-key/primary-key relations between entities. Database indexes and referential integrity constraints can help avoid these problems; however, not all data is within a relational database. Some of the relations cross databases and technologies, and not all databases, especially analytically focused ones, can afford to implement referential integrity constraints. Without rigorous enforcement, uniqueness and referential integrity violations are commonplace.
- Consistency: While referential integrity addresses entity relationships, consistency is concerned with content overlaps and inconsistencies of data as replicas and derivations are created and stored. In a data warehouse environment, data often is extracted from multi-mastered replica databases. Changes in business requirements regularly lead to database schemas changes. Data quality is compromised when these schema changes reinterpret replicated or derived data without regard to original definition.
- Freshness and timeliness: Freshness addresses the currency of the content of the data. Timeliness addresses when data becomes available to users or downstream systems. A month-end process completing on Feb. 1 may be timely, but not fresh, because, January data was incomplete. On the other hand, we may have all January data available, but month-end processing may not complete until Feb. 15, far too late for timely business decision-making.
- Business rules conformance: Business rules conformance is the most complex of the sophistication levels. It deals with whether data is used and transformed consistently with its intent, definition and semantics. Finding concurrence for these across the enterprise is difficult enough; business-driven changes to intent, definition and semantics exacerbate the problem.
These principles are sound for any business or technical practitioner to observe, regardless of industry. But with increased regulatory pressures and heightened competition within financial services today, these guidelines are particularly relevant to you. Keep these six sophistication levels in mind when selecting tools, defining methodologies or implementing processes for data quality analysis and remediation. These levels lend structure to your efforts and help you avoid potential frustration.
Editor's note: More financial services articles, resources, news and events are available in the Business Intelligence Network's Financial Services Channel. Be sure to visit today!