ra2 studio - Fotolia
When I started working as a database administrator in 1983, it was all about centralization in technology. Data was safely on the corporate mainframe, and just those programmers who had the skills to navigate prerelational databases could access it. Nearly four decades later, it's all about data democratization and the need for a strong data governance strategy.
Back in the day, business analysts had to go cap in hand to the IT department because they didn't know how to navigate an Information Management System database and wouldn't have been granted access even if they could. The IT department printed off monthly reports and distributed them, like Moses descending from the mountain with tablets of stone.
With the advent of the personal computer, the balance of power shifted radically. Suddenly, businesspeople had access to spreadsheets and could create their own calculations and analyses, even if the data was still mostly out of reach. Then came client/server computing and a rush to decentralize data, bringing giddying new possibilities but also confusion as different versions of data were used by different departments. And analysts fought over whose version was correct. Analytics could now be done by business analysts, but without agreement on the legitimacy of the data sources, chaos ensued.
The dawn of data governance
IT responded with the data warehouse, which would gather up data in disconnected transaction systems for the sole purpose of analytics. Clever reporting tools appeared that made it easier to manipulate, join and summarize raw tables of transactions and maybe even download them to spreadsheets. Sure, the original data was still stored in different applications and formats, but with enough effort, the data warehouse could be coaxed into making sense of all this, providing dimensions like customer, product, asset and location. However, to actually produce consistent lists of customers and products, the inconsistencies of the underlying systems had to be resolved.
Master data management (MDM) was born and, alongside it, the need for a data governance strategy. Business users were encouraged or cajoled into deciding which classifications of customers and products were "golden records" to be held aloft across the enterprise and which were to be cast into the wilderness of department-specific, local terminology. This was a frequently acrimonious process, with different departments arguing over which was the best way to classify data. Some company cultures suit this approach more than others. Highly centralized companies are used to having structure dictated from on high, but decentralized ones rail against this and struggle to keep within data governance structures. Analysts in such companies think of themselves as freedom fighters, whereas those in the central office regard them as data terrorists.
It seems clear that, at least in a lot of companies, the freedom fighters are now in the ascendant. A sign of this is the growing market for data preparation tools. These products are able to access data from a wide variety of sources, including traditional databases, applications packages, Excel or applications outside the corporate firewall. They enable some data quality techniques, such as profiling, and empower business users to set up data transformations and automate such extracts, data scrubbing and transformations via repeatable workflow processes. These tools have analytical tools of their own or can invoke the latest in visualization and data mining products to enable analysts to manipulate data to their heart's content.
Such a market would simply not exist if corporate data warehouses and MDM were doing their job. Data preparation, quality checking and transformation are exactly what's supposed to happen as preparation to feed data into the data warehouse. The trouble is that the corporate data warehouse has been stretched beyond its natural limits. Data now comes from such a variety of sources -- many of them outside the enterprise -- and in such volumes that traditional data management approaches are breaking down.
E-commerce systems may generate web traffic logs of such size that normal databases cannot handle the processing. Sensors on vehicles and machinery now generate huge volumes of streaming data: A Boeing 787 generates almost a terabyte per flight. It is the same story in other industries, with cars, smart meters in homes and even sensors under roads generating huge volumes of data to be analyzed. All this is in addition to the traditional corporate data, plus data coming in from business partners and data brokers. With that much data coming at you, who has time for meetings to discuss the merits of different customer classification hierarchies?
Mastering data management
Corporations need to somehow take back control of this fast-flowing stream of data if they are to make sense of it. Data lakes become data swamps if there is no way to peer into its depths and make sense of it. Data governance strategy may not be a sexy topic, but it's at the heart of what needs to happen. Those analysts using new tools to build their own extracts and transformations need to help decide how that data is managed, because all the pretty charts and AI tools mean nothing if you cannot agree on whether the underlying data is trustworthy.
In the absence of some structure, we will just be back to the old days, with analysts waving charts at each other and arguing over whose data is correct. Putting the data genie back in the bottle will be difficult and require discipline, but in all too many organizations, things now feel chaotic rather than managed. It is not about imposing rules from on high but about embedding analytics and data management discipline throughout the layers of the organization. Otherwise, valuable business insights may be overlooked, and competitive advantage lost.
A 2018 McKinsey report reckoned that high-performing companies were more than twice as likely to have a strong data governance strategy and twice as likely to have a clear and well-understood data strategy overall. The same report reckoned that the gap between the high performers and the pack was growing rapidly. Time is of the essence if you are to exploit analytics and gain business advantage.