This article originally appeared on the BeyeNETWORK
In a previous article, I discussed the situation in which a bank CEO believed that data was a constant obstacle for his bank. Whenever management had to make key decisions, regulators had to ask for more analysis or product developers had to make enhancements, the answer was always the same—“We don’t have the data and it will take weeks, maybe months, to get it.”
Recently, I had an intriguing discussion at another financial institution. Regarding their data access for regulatory compliance, someone explained that all required data was already in one or more corporate data stores. Here, the issue was extracting the right data and then reconciling that data to “known” enterprise totals. The process looks similar to the following chart complete with the “@&$#*$# Loop”
This particular institution was measuring the time required for preparing data to analyze in week-long increments. This process begins with extracting data. After putting the data into the reporting tool and running it through to created a report, analysts found that the data was wrong—hence the “explicative deleted” loop. Once the data was “as good as it was going to get,” the bank began the manual scrub, fix & foot (MSF&F) stage with numerous exports to spreadsheets. This manually fixed the data so it would go to the known enterprise totals. This institution estimated that 90% of the effort was related to the data’s poor quality. Essentially, a huge investment in time and resources was being wasted.
Given the size of the institution, our group quickly concluded that there were tens of thousands of “@&$#*$#” Loops, as well as existing MSF&F spreadsheets throughout the bank. All of this “organic growth” was occurring beyond the scrutiny of audit or the control of corporate governance. The enterprise view looked similar to the following chart.
This quick analysis led to several realizations:
- Too many of the key management reports and regulatory reports were actually based on manually edited spreadsheets and, therefore, not compliant with Sarbanes-Oxley.
- Almost 90% of the effort to obtain, analyze and utilize data is tied up in extract delays, data checking/verification and MSF&F activities.
- In the case of several regulatory/audit reports, actually beginning the MSF&F activity took months—not hours or even days.
- In short, the data quality in the Reporting Data Store produced the productivity delays and created the regulatory risks.
After coming to these realizations, everyone in the room understood the value of data quality. They now viewed it as a productivity tool possessing solid, hard-dollar, payback that can be measured in millions per year. The key to reaping these dividends lies in implementing an automated data quality process to deliver “appropriately certified” data to each user in the organization.
“Appropriately Certified” Data
“Appropriately Certified” data means that data quality is also about balancing risk pyramids. The next chart illustrates the multi-scale balancing act that must be accomplished. The pyramid on the left represents the pyramid of users in any organization. At the base, there are a very large number of data users throughout each line of business. The middle pyramid represents the level of risk that these users will make decisions or use the data in a manner that could negatively impact the capital of the institution. As you can see, the users throughout the lines of business carry a low market risk. In contrast, this large user base (the pyramid on the right) represents a potentially high operations risk. A relevant example of this is the threat of identity theft.
Executive management, however, represents a low operations risk. At the same time, they have a potentially high market risk if they use data to make inappropriate decisions that negatively impact stock prices. Such decisions will negatively impact the capital position of the institution.
When I talk about “appropriately certified” data, I am referring to the fact that all data (within an organization) need not conform to the highest levels of data quality. This provides us with the ability to prioritize the implementation of data quality programs, based on measured and quantifiable risk profiles.
The reality is that data quality is not a binary attribute. Quality is measurable and can be implemented continuously. Because of this, data quality can be measured and tracked over time.
This ability to measure data quality over time leads to a powerful productivity and control tool—the Service Level Agreement (SLA). With an appropriate data quality process, the implementation of Quality SLAs is entirely possible. With SLAs in place and measured, the organization can accurately certify data for appropriate use, thereby completing the balancing act.
The Data Quality Process
The next chart illustrates the data quality process that is imbedded into the ETL process, which is used to generate most data stores for reporting and analysis. As you can see, the process is structured in a way that can be highly automated. The structure also relies heavily on the use of profile, meta, reference and master data to drive the verification, remediation, transformation and certification steps. When measuring quality, extensive metrics can be captured at each stage of the process. These can then be tracked over time, thus enabling the development of actionable SLAs and certified data stores.
Through this short overview of the comprehensive data quality process, I hopefully have stimulated readers to think outside of the box in regards to data and information productivity. Clearly, a few extra steps in constructing the infrastructure and ensuring quality processes can lead to tremendous productivity and cost savings.
Before closing, however, please consider this cautionary note: A surface-level understanding of this process may give one the impression that by applying this structure across the enterprise, you’ll reach the proverbial “Pot-O’Gold.” To the contrary, such a massive effort has a greater chance of failure. It is much better to use the Risk Pyramids as a guide to rank and prioritize various new and existing analyses and reporting efforts. After completing this, you can implement the comprehensive data quality process in one or two places to demonstrate its value. Success breeds success and enterprise-wide implementations.