Home > An introduction to data warehousing and decision support systems
Chapter Download:
EMAIL THIS

An introduction to data warehousing and decision support systems

10 Jun 2009 | by Matteo Golfarelli and Stefano Rizzi

Tips, expert advice and sample chapters
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

Data Warehouse Design book cover
This chapter excerpt from Data Warehouse Design: Modern Principles and Methodologies provides an introduction to data warehousing, explains how data warehouse technologies are used and basic data warehouse concepts. The excerpt also defines decision support systems (DSS) as well as describes what data warehousing and what a data warehouse is.

Table of contents:

An introduction to data warehousing
Data warehouse architectures, concepts and phases
The advantages of multidimensional databases and cube modeling
Best practices for data warehouse access and reports
Similarities and differences between ROLAP, MOLAP and HOLAP

Introduction to Data Warehousing

Information assets are immensely valuable to any enterprise, and because of this, these assets must be properly stored and readily accessible when they are needed. However, the availability of too much data makes the extraction of the most important information difficult, if not impossible. View results from any Google search, and you'll see that the data = information equation is not always correct—that is, too much data is simply too much.

Copyright info
Excerpted from Data Warehouse Design: Modern Principles and Methodologies by Matteo Golfarelli and Stefano Rizzi (McGraw-Hill; 2009) with permission from McGraw-Hill.
Data warehousing is a phenomenon that grew from the huge amount of electronic data stored in recent years and from the urgent need to use that data to accomplish goals that go beyond the routine tasks linked to daily processing. In a typical scenario, a large corporation has many branches, and senior managers need to quantify and evaluate how each branch contributes to the global business performance. The corporate database stores detailed data on the tasks performed by branches. To meet the managers' needs, tailor-made queries can be issued to retrieve the required data. In order for this process to work, database administrators must first formulate the desired query (typically an aggregate SQL query) after closely studying database catalogs. Then the query is processed. This can take a few hours because of the huge amount of data, the query complexity, and the concurrent effects of other regular workload queries on data. Finally, a report is generated and passed to senior managers in the form of a spreadsheet.

Many years ago, database designers realized that such an approach is hardly feasible, because it is very demanding in terms of time and resources, and it does not always achieve the desired results. Moreover, a mix of analytical queries with transactional routine queries inevitably slows down the system, and this does not meet the needs of users of either type of query. Today's advanced data warehousing processes separate online analytical processing (OLAP) from online transactional processing (OLTP) by creating a new information repository that integrates basic data from various sources, properly arranges data formats, and then makes data available for analysis and evaluation aimed at planning and decision-making processes (Lechtenbörger, 2001).

Let's review some fields of application for which data warehouse technologies are successfully used:

  • Trade: Sales and claims analyses, shipment and inventory control, customer care and public relations
  • Craftsmanship: Production cost control, supplier and order support
  • Financial services: Risk analysis and credit cards, fraud detection
  • Transport industry: Vehicle management
  • Telecommunication services: Call flow analysis and customer profile analysis
  • Health care service: Patient admission and discharge analysis and bookkeeping in accounts departments

The field of application of data warehouse systems is not only restricted to enterprises, but it also ranges from epidemiology to demography, from natural science to education. A property that is common to all fields is the need for storage and query tools to retrieve information summaries easily and quickly from the huge amount of data stored in databases or made available by the Internet. This kind of information allows us to study business phenomena, learn about meaningful correlations, and gain useful knowledge to support decision-making processes.


1.1 Decision Support Systems
Until the mid-1980s, enterprise databases stored only operational data—data created by business operations involved in daily management processes, such as purchase management, sales management, and invoicing. However, every enterprise must have quick, comprehensive access to the information required by decision-making processes. This strategic information is extracted mainly from the huge amount of operational data stored in enterprise databases by means of a progressive selection and aggregation process shown in Figure 1-1.

Figure 1.1: Information value as a function of quantity
Progressive selection and aggregation process

An exponential increase in operational data has made computers the only tools suitable for providing data for decision-making performed by business managers. This fact has dramatically affected the role of enterprise databases and fostered the introduction of decision support systems. The concept of decision support systems mainly evolved from two research fields: theoretical studies on decision-making processes for organizations and technical research on interactive IT systems. However, the decision support system concept is based on several disciplines, such as databases, artificial intelligence, man-machine interaction, and simulation. Decision support systems became a research field in the mid-'70s and became more popular in the '80s.

What is a decision support system (DSS)?
Definition: A decision support system (DSS) is a set of expandable, interactive IT techniques and tools designed for processing and analyzing data and for supporting managers in decision making. To do this, the system matches individual resources of managers with computer resources to improve the quality of the decisions made.
In practice, a DSS is an IT system that helps managers make decisions or choose among different alternatives. The system provides value estimates for each alternative, allowing the manager to critically review the results. Table 1-1 shows a possible classification of DSSs on the basis of their functions (Power, 2002).

From the architectural viewpoint, a DSS typically includes a model-based management system connected to a knowledge engine and, of course, an interactive graphical user interface (Sprague and Carlson, 1982). Data warehouse systems have been managing the data back-ends of DSSs since the 1990s. They must retrieve useful information from a huge amount of data stored on heterogeneous platforms. In this way, decision-makers can formulate their queries and conduct complex analyses on relevant information without slowing down operational systems.

Table 1-1Classification of Decision Support Systems
SystemDescription
Passive DSSSupports decision-making processes, but it does not offer explicit suggestions on decisions or solutions.
Active DSSOffers suggestions and solutions.
Collaborative DSS Operates interactively and allows decision-makers to modify, integrate, or refine suggestions given by the system. Suggestions are sent back to the system for validation.
Model-driven DSSEnhances management of statistical, financial, optimization, and simulation models.
Communication-driven DSSSupports a group of people working on a common task.
Data-driven DSSEnhances the access and management of time series of corporate and external data.
Document-driven DSSManages and processes nonstructured data in many formats.
Knowledge-driven DSSProvides problem-solving features in the form of facts, rules, and procedures.


1.2 Data Warehousing
Data warehouse systems are probably the systems to which academic communities and industrial bodies have been paying the greatest attention among all the DSSs. Data warehousing can be informally defined as follows:

What is data warehousing?
Definition: Data warehousing is a collection of methods, techniques, and tools used to support knowledge workers—senior managers, directors, managers, and analysts—to conduct data analyses that help with performing decision-making processes and improving information resources.
The definition of data warehousing presented here is intentionally generic; it gives you an idea of the process but does not include specific features of the process. To understand the role and the useful properties of data warehousing completely, you must first understand the needs that brought it into being. In 1996, R. Kimball efficiently summed up a few claims frequently submitted by end users of classic information systems:

  • "We have heaps of data, but we cannot access it!" This shows the frustration of those who are responsible for the future of their enterprises but have no technical tools to help them extract the required information in a proper format.
  • "How can people playing the same role achieve substantially different results?" In midsize to large enterprises, many databases are usually available, each devoted to a specific business area. They are often stored on different logical and physical media that are not conceptually integrated. For this reason, the results achieved in every business area are likely to be inconsistent.
  • "We want to select, group, and manipulate data in every possible way!" Decision-making processes cannot always be planned before the decisions are made. End users need a tool that is user-friendly and flexible enough to conduct ad hoc analyses. They want to choose which new correlations they need to search for in real time as they analyze the information retrieved.
  • "Show me just what matters!" Examining data at the maximum level of detail is not only useless for decision-making processes, but is also self-defeating, because it does not allow users to focus their attention on meaningful information.
  • "Everyone knows that some data is wrong!" This is another sore point. An appreciable percentage of transactional data is not correct—or it is unavailable. It is clear that you cannot achieve good results if you base your analyses on incorrect or incomplete data.

We can use the previous list of problems and difficulties to extract a list of key words that become distinguishing marks and essential requirements for a data warehouse process, a set of tasks that allow us to turn operational data into decision-making support information:

  • accessibility to users not very familiar with IT and data structures;
  • integration of data on the basis of a standard enterprise model;
  • query flexibility to maximize the advantages obtained from the existing information;
  • information conciseness allowing for target-oriented and effective analyses;
  • multidimensional representation giving users an intuitive and manageable view of information;
  • correctness and completeness of integrated data.

Data warehouses are placed right in the middle of this process and act as repositories for data. They make sure that the requirements set can be fulfilled.

What is a data warehouse?
Definition: A data warehouse is a collection of data that supports decision-making processes. It provides the following features (Inmon, 2005):

• It is subject-oriented.
• It is integrated and consistent.
• It shows its evolution over time and it is not volatile.

Data warehouses are subject-oriented because they hinge on enterprise-specific concepts, such as customers, products, sales, and orders. On the contrary, operational databases hinge on many different enterprise-specific applications.

We put emphasis on integration and consistency because data warehouses take advantage of multiple data sources, such as data extracted from production and then stored to enterprise databases, or even data from a third party's information systems. A data warehouse should provide a unified view of all the data. Generally speaking, we can state that creating a data warehouse system does not require that new information be added; rather, existing information needs rearranging. This implicitly means that an information system should be previously available.

Operational data usually covers a short period of time, because most transactions involve the latest data. A data warehouse should enable analyses that instead cover a few years. For this reason, data warehouses are regularly updated from operational data and keep on growing. If data were visually represented, it might progress like so: A photograph of operational data would be made at regular intervals. The sequence of photographs would be stored to a data warehouse, and results would be shown in a movie that reveals the status of an enterprise from its foundation until present.

Fundamentally, data is never deleted from data warehouses and updates are normally carried out when data warehouses are offline. This means that data warehouses can be essentially viewed as read-only databases. This satisfies the users' need for a short analysis query response time and has other important effects. First, it affects data warehouse–specific database management system (DBMS) technologies, because there is no need for advanced transaction management techniques required by operational applications. Second, data warehouses operate in read-only mode, so data warehouse–specific logical design solutions are completely different from those used for operational databases. For instance, the most obvious feature of data warehouse relational implementations is that table normalization can be given up to partially denormalize tables and improve performance.

Other differences between operational databases and data warehouses are connected with query types. Operational queries execute transactions that generally read/write a small number of tuples from/to many tables connected by simple relations. For example, this applies if you search for the data of a customer in order to insert a new customer order. This kind of query is an OLTP query. On the contrary, the type of query required in data warehouses is OLAP. It features dynamic, multidimensional analyses that need to scan a huge amount of records to process a set of numeric data summing up the performance of an enterprise. It is important to note that OLTP systems have an essential workload core "frozen" in application programs, and ad hoc data queries are occasionally run for data maintenance. Conversely, data warehouse interactivity is an essential property for analysis sessions, so the actual workload constantly changes as time goes by. The distinctive features of OLAP queries suggest adoption of a multidimensional representation for data warehouse data. Basically, data is viewed as points in space, whose dimensions correspond to many possible analysis dimensions. Each point represents an event that occurs in an enterprise and is described by a set of measures relevant to decision-making processes. Section 1.5 gives a detailed description of the multidimensional model you absolutely need to be familiar with to understand how to model conceptual and logical levels of a data warehouse and how to query data warehouses.

Table 1-2 summarizes the main differences between operational databases and data warehouses.

Note: For further details on the different issues related to the data warehouse process, refer to Chaudhuri and Dayal, 1997; Inmon, 2005; Jarke et al., 2000; Kelly, 1997; Kimball, 1996; Mattison, 2006; and Wrembel and Koncilia, 2007.

Table 1-2: Differences Between Operational Databases and Data Warehouses (Kelly, 1997)
FeatureOperational DatabasesData Warehouses
UsersThousandsHundreds
WorkloadPreset transactionsSpecific analysis queries
AccessTo hundreds of records, write and read modeTo millions of records, mainly read-only mode
GoalDepends on applicationsDecision-making support
DataDetailed, both numeric and alphanumericSummed up, mainly numeric
Data integrationApplication-basedSubject-based
QualityIn terms of integrityIn terms of consistency
Time coverageCurrent data onlyCurrent and historical data
UpdatesContinuousPeriodical
ModelNormalizedDenormalized, multidimensional
OptimizationFor OLTP access to a database partFor OLAP access to most of the database

More on data warehouse design:



Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Data warehouse basics
Definition of primary, super, foreign and candidate key in the DBMS
What is the difference between a logical and physical warehouse design?
Best practices for data warehouse access and reports
Similarities and differences between ROLAP, MOLAP and HOLAP
Data warehouse architectures, concepts and phases
Advantages of the multidimensional database model and cube modeling
Data warehousing, data mining and data querying: Terms and definitions
What is an operational data store vs. a data warehouse?
Why you should build a data warehouse
Data mart vs. data warehouse

Data management tutorials
Seven secrets to business intelligence (BI) success
How to design project management plan templates for large projects
Handling project management issues for a proprietary ownership conflict
Should we use traditional or agile software development processes?
Next steps for business intelligence: Video
Improving ODBC application performance and coding
Selecting ODBC functions for optimized SQL statements
Guidelines for managing data updates to optimize ODBC performance
Is it better to have a centralized or decentralized master data structure?
Can we leverage existing data quality tools for an MDM program?

Enterprise data architecture best practices
Advantages and disadvantages of XML shredding
How to shred XML with the DB2 XMLTABLE function
Shredding XML docs into relational tables with annotated XML schemas
Teradata takes a logical approach to data warehousing appliances
Examples of single and bulk XML shredding of XML documents
What is the difference between a logical and physical warehouse design?
What are some emerging data warehouse and DBMS trends?
Teradata VP talks data warehouse appliances, reveals cloud and SSD plans
Selecting ODBC functions for optimized SQL statements
Guidelines for managing data updates to optimize ODBC performance

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Data Compliance Articles and Research: Data Privacy, Financial Data Management, Healthcare Data
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts