Is Data Warehousing Essential to Business Intelligence?

The role of data warehousing is changing as companies move toward using new business intelligence approaches.

This article originally appeared on the BeyeNETWORK.

Over the last few issues of this newsletter, I have been writing about the impact of Enterprise 2.0 approaches on business intelligence (BI). I intended this time to write about how business intelligence and data warehousing can exploit the growing amount of business content generated by Enterprise 2.0 collaborative and social networking technologies. I decided, however, that before I could do this, I first needed to review the role of data warehousing in BI projects; because as companies move toward using new BI approaches (operational BI, for example), this role is changing.

The title of this article is deliberately provocative. My objective is to encourage business intelligence professionals to consider the role of data warehousing in new BI projects. In the past, data warehousing has been the cornerstone for such projects; but I believe that in many situations, this is no longer true.

A Historical Perspective

Essentially, there are three main types of IT processing involved in running the business: business transaction processing, business intelligence processing and business collaboration processing. Business transaction applications run day-to-day business operations, while BI applications analyze those operations with the objective of optimizing and improving them. Collaboration systems enable business users to share information and expertise about business operations.

Prior to the introduction of the concept of business intelligence, most companies analyzed their business operations using decision support applications that queried and reported directly on data stored in business transaction databases. There were several problems with this approach. The five key ones are: 1) the data was not usually in a suitable form for reporting, 2) the data often had quality issues, 3) decision support processing degraded business transaction performance, 4) data was often dispersed across many different systems, and 5) there was a general lack of historical information.

Data warehousing was introduced to help solve these data and performance issues. While there is no question that data warehousing helped improve business decision making, it is important to realize, nevertheless, that it was introduced primarily to solve design issues in business transaction systems, and also for performance reasons.

The emergence of business intelligence and business performance management (BPM) applications and tools further enhanced business decision making by giving business users simpler interfaces, improved data analysis features, and the ability to compare actual and planned performance. Although business intelligence and business performance management applications typically process data in a data warehouse, this is only because of the five issues outlined earlier concerning direct access to business transaction data. If these issues could be resolved, then there would be no need for a data warehouse.

Traditional Business Intelligence

Traditionally, business intelligence has been used for many years for strategic and tactical decision making. This type of processing involves intensive analytical processing of historical and summarized data managed in an enterprise data warehouse. Data performance issues caused by centralizing data in an enterprise data warehouse have led to the creation of data marts, which solve performance problems by spreading the BI processing across multiple data stores.

The problem with data marts is that organizations often build them directly from business transaction databases, rather than the enterprise data warehouse. This is because it is often quicker and easier to build a data mart than to incorporate additional data into the enterprise data warehouse and then build the data mart from the data warehouse. Another problem is that many organizations have more than one “enterprise” data warehouse. Multiple disconnected data warehouses and data marts leads to data consistency issues, which data warehousing was supposed to solve in the first place.  

There are many articles written about the problems of building dozens of independent data marts directly from operational data, and I will not discuss these problems here. It is worth pointing out, however, that a data mart solves most of the same issues addressed by an enterprise data warehouse with the exception that decisions sourced from multiple data marts may be inconsistent in the same way that decisions based on multiple business transaction databases can be inconsistent. This multisource issue could be mitigated by first integrating the business transaction data, for example, into an operational data store or master data store. The issue of historical data would have to be addressed, but this is solvable.

Historical Data and Current Data

The distinction between current data and historical data should be easy to define, but it is not. Data in a business transaction data store is usually current, while data in a data warehouse is usually considered to be historical. The issue here is, “What is meant by current and historical?” Let’s look at an example.

If I have multiple telephone accounts with a telephone company, then (in simplistic terms) the company will have a record showing my customer data, and a record for each of my accounts listing the telephone number, account balance, billing data and so forth. When I make a telephone call, send a text message or access the Internet, I create a call data record (CDR) that can be collected with other CDRs to analyze customer calling patterns, detect fraud, etc.

At any given moment in time, the customer and account records will show the latest information about my current status. This can be considered to be current data. As this data is updated with new address and account data, the old data may be captured into a data warehouse for analysis purposes. The capturing process may be done at particular moments in a time (a snapshot), or continuously, depending on how the data will be analyzed. Regardless, the data in the data warehouse is historical.

The CDR data is a different situation. In general, once I make a phone call, the CDR for that call never changes. When a telephone company captures the CDR into a data store for analysis, is the data current or historical data? The answer is it is current data because even though the data ages over time, it is always the current version of the data. What name do we give the CDR data store? I suspect most people would call it a data mart. Is this the correct term? Does it really matter what we call it, other than the fact some people have the need to give things labels? Regardless, we most probably don’t want to keep the CDR data in an enterprise data warehouse. It is useful, however, to keep a historical record of the CDR analysis results because this shows trends and patterns over time.

If the CDR information is used to detect fraud, then the quicker the analysis can be done, the faster fraud can be detected. The business intelligence application can process the data in-flight as it flows through the system, or can analyze it in a persistent data store that is updated continuously with the CDR data. Data mining of past CDR records can help set up the business rules for doing this detection. This fraud detection application is a good example of an operational BI application.

Operational Business Intelligence

There are a growing number of operational business applications similar to those described previously for CDR analysis. The business benefit of these applications is that they can help companies become more agile by analyzing data during intra-day operations. The ultimate example of this type of processing is algorithmic trading, which employs complex event processing (CEP) and stream analytics to optimize trading operations. The response time of these analyses are a fraction of a second. In this type of processing, it is not feasible, or even necessary, to store the huge volumes of data involved in a data warehouse. The results of BI processing, however, may be kept for future use.

The model for many operational business intelligence applications is capture data, analyze data, persist results (i.e., they analyze data before persisting it). This is different from the traditional BI model of capture data, persist data, analyze data, persist results. BI applications that analyze Web traffic and business activity on commercial websites to track buying trends, optimize prices, and so forth is another example of operational BI. Although these applications analyze data in-flight, they may also use historical data in a data warehouse to assist in the analysis.              

Why is this discussion important? The main reason is at present, business intelligence is synonymous with data warehousing. This thinking is wrong and needs to be changed. Data warehousing is a component of business intelligence, but business intelligence may employ data in other data stores. In some cases, a BI application may not even use data managed in a data warehouse. The tight connection between business intelligence and data warehousing is causing terms such as virtual data warehousing and virtual BI to be used to describe other types of BI processing. These terms are unnecessary and just confuse everybody.

Another issue is that people have forgotten that data warehousing was created to overcome deficiencies in business transaction systems. Many of these issues are now solvable. My concern is that data warehousing has become a system in its own right and that companies are now extending the data warehouse into other application areas such as master data management and content management. This is completely the wrong direction and must be argued against.

The bottom line is that data warehousing is still an important component of business intelligence, but it is no longer the foundation on which all BI projects have to be built.

Dig Deeper on Data warehouse software