michelangelus - Fotolia
Data warehouses are quite long in the tooth. The fundamental idea is that most enterprises have built up various applications to automate their basic business processes, which have grown independently, resulting in multiple versions of data. In the past, for those who wanted to see what sales revenues or profitability was by product line or region, the inconsistent data was a major headache.
The original concept of a data warehouse, in essence, was to copy key data from various transaction systems, resolve inconsistencies and produce a clean data set that could be easily analyzed. This process was usually done with periodically run programs that would keep the data warehouse up to date as new data was ingested.
The history of the data warehouse environment
There are many components to keeping a data warehouse environment up and running. It requires data extract programs, and the data warehouse has to be designed with its own schema. A modern data warehouse also requires further programs to resolve competing versions of contextual data with business rules to decide hierarchies for the master version of the data.
One historical challenge was that the data warehouse was similar to a building constructed on the shifting sands of the underlying operational systems. When these systems changes significantly -- for example, in a reorganization or a business acquisition -- the structure of the data warehouse needed to change to reflect underlying business changes. If the pace of business change was too great, the data warehouse could become unreliable for a time, undermining business confidence in the analytics produced from it.
The data warehouse is still an important part of the data pipeline. Learn about modernization approaches, data warehouse jobs and its changing role in analytics.
Data marts sprang up in order to get around this issue. But unless data marts are carefully generated in sync with data from the warehouse, they may compete with it and produce multiple versions of the truth. Various data warehouse designs arose in order to address this instability, with star schemas, snowflake schemas and others advocated by technology gurus like Bill Inmon and Ralph Kimball.
The field of master data management emerged, hoping to corral the increasingly complex set of business-context data, which often had a separate database that could operate in concert with a data warehouse. Wrangling different versions of product hierarchies required business input, leading to data governance in data warehouses to provide processes that enabled business control of such master data.
The added complexity, on top of the sheer volume of data, became a problem. Far more elaborate query and analytic tools were eventually developed, themselves often requiring their own metadata layers to represent business views on top of the data warehouse.
The whole business of extracting, transforming and loading (ETL) data spawned an industry of data integration tools. These tools automated processes and had their own proprietary scripts, adding additional components to the data warehouse environment that needed to be handled.
Data warehouse modernization
Attempts have been made over the years to bring some order to the menagerie of components that constitute an enterprise data warehouse environment. In order to modernize the increasingly creaky monolith, vendors have attempted to produce prebuilt templates and data warehouse generators. Examples of these include Idera, Magnitude and Attunity. Though undeniably successful in certain use cases, none of these have achieved market dominance.
Additionally, DevOps and DataOps endeavor to help with data warehouse schema evolution and other aspects of the whole process of making a data warehouse environment operational in a controlled way.
Despite the valiant efforts of innovative vendors, there is no silver bullet for data warehouse modernization.
Large enterprises have invested a lot of money in corporate data warehouses and their associated environments, and the sheer bulk of processes, programs, scripts and schemas is a significant barrier to moving forward. Overcoming the inertia of current practices among database administrators and IT staff is also a factor.
Since companies depend on data warehouses for much of their analytics, it is hard to migrate. Restructuring an operational data warehouse environment is like a mechanic trying to make an engine upgrade to a car in motion. Nonetheless, data warehouse automation tools and the modern DataOps market are doing their best to enable enterprises to modernize their data warehouse environments.