This book excerpt from Data Warehouse Design: Modern Principles and Methodologiescovers best practices and techniques for querying and accessing data warehouses, including running reports, OLAP sessions and dashboards. It also covers common OLAP operators including roll-up, drill-down, slice-and-dice, pivot, drill-across and drill-through.
Table of contents:
An introduction to data warehousing
Data warehouse architectures, concepts and phases
The advantages of multidimensional databases and cube modeling
Best practices for data warehouse access and reports
Similarities and differences between ROLAP, MOLAP and HOLAP
1.7 Accessing Data Warehouses
Analysis is the last level common to all data warehouse architecture types. After cleansing, integrating, and transforming data, you should determine how to get the best out of it in terms of information. The following sections show the best approaches for end users to query data warehouses: reports, OLAP, and dashboards. End users often use the information stored to a data warehouse as a starting point for additional business intelligence applications, such as what-if analyses and data mining. See Chapter 15 for more details on these advanced applications.
This approach is oriented to those users who need to have regular access to the information in an almost static way. For example, suppose a local health authority must send to its state offices monthly reports summing up information on patient admission costs. The layout of those reports has been predetermined and may vary only if changes are applied to current laws and regulations. Designers issue the queries to create reports with the desired layout and "freeze" all those in an application. In this way, end users can query current data whenever they need to.
A report is defined by a query and a layout. A query generally implies a restriction and an aggregation of multidimensional data. For example, you can look for the monthly receipts during the last quarter for every product category. A layout can look like a table or a chart (diagrams, histograms, pies, and so on). Figure 1-16 shows a few examples of layouts for the receipts query.
A reporting tool should be evaluated not only on the basis of comprehensive report layouts, but also on the basis of flexible report delivery systems. A report can be explicitly run by users or automatically and regularly sent to registered end users. For example, it can be sent via e-mail.
Keep in mind that reports existed long before data warehouse systems came to be. Reports have always been the main tool used by managers for evaluating and planning tasks since the invention of databases. However, adding data warehouses to the mix is beneficial to reports for two main reasons: First, they take advantage of reliable and correct results because the data summed up in reports is consistent and integrated. In addition, data warehouses expedite the reporting process because the architectural separation between transaction processing and analyses significantly improves performance.
OLAP might be the main way to exploit information in a data warehouse. Surely it is the most popular one, and it gives end users, whose analysis needs are not easy to define beforehand, the opportunity to analyze and explore data interactively on the basis of the multidimensional model. While users of reporting tools essentially play a passive role, OLAP users are able to start a complex analysis session actively, where each step is the result of the outcome of preceding steps. Real-time properties of OLAP sessions, required in-depth knowledge of data, complex queries that can be issued, and design for users not familiar with IT make the tools in use play a crucial role. The GUI of these tools must be flexible, easy-to-use, and effective.
More on enterprise data warehouse best practices
Find out how in-database analytics is helping a marketing company fight customer churn
Read about the rising demand for data warehouse services seen in Gartner's Magic Quadrant
Learn best practices for avoiding Hadoop performance bottlenecks
An OLAP session consists of a navigation path that corresponds to an analysis process for facts according to different viewpoints and at different detail levels. This path is turned into a sequence of queries, which are often not issued directly, but differentially expressed with reference to the previous query. The results of queries are multidimensional. Because we humans have a difficult time deciphering diagrams of more than three dimensions, OLAP tools typically use tables to display data, with multiple headers, colors, and other features to highlight data dimensions.
Every step of an analysis session is characterized by an OLAP operator that turns the latest query into a new one. The most common operators are roll-up, drill-down, slice-and-dice, pivot, drill-across, and drill-through. The figures included here show different operators, and were generated using the MicroStrategy Desktop front-end application in the MicroStrategy 8 tool suite. They are based on the V-Mall example, in which a large virtual mall sells items from its catalog via phone and the Internet. Figure 1-17 shows the attribute hierarchies relevant to the sales fact in V-Mall.
The roll-up operator causes an increase in data aggregation and removes a detail level from a hierarchy. For example, Figure 1-18 shows a query posed by a user that displays monthly revenues in 2005 and 2006 for every customer region. If you "roll it up," you remove the month detail to display quarterly total revenues per region. Rolling-up can also reduce the number of dimensions in your results if you remove all the hierarchy details. If you apply this principle to Figure 1-19, you can remove information on customers and display yearly total revenues per product category as you turn the three-dimensional table into a two-dimensional one. Figure 1-20 uses the cube metaphor to sketch a roll-up operation with and without a decrease in dimensions.
The drill-down operator is the complement to the roll-up operator. Figure 1-20 shows that it reduces data aggregation and adds a new detail level to a hierarchy. Figure 1-21 shows an example based on a bidimensional table. This table shows that the aggregation based on customer regions shifts to a new fine-grained aggregation based on customer cities.
In Figure 1-22, the drill-down operator causes an increase in the number of table dimensions after adding customer region details.
Slice-and-dice is one of the most abused terms in data warehouse literature because it can have many different meanings. A few authors use it generally to define the whole OLAP navigation process. Other authors use it to define selection and projection operations based on data. In compliance with section 1.5.1, we define slicing as an operation that reduces the number of cube dimensions after setting one of the dimensions to a specific value. Dicing is an operation that reduces the set of data being analyzed by a selection criterion (Figure 1-23). Figures 1-24 and 1-25 show a few examples of slicing and dicing.
The pivot operator implies a change in layouts. It aims at analyzing an individual group of information from a different viewpoint. According to the multidimensional metaphor, if you pivot data, you rotate your cube so that you can rearrange cells on the basis of a new perspective. In practice, you can highlight a different combination of dimensions (Figure 1-26). Figures 1-27 and 1-28 show a few examples of pivoted two-dimensional and three-dimensional tables.
The term drill-across stands for the opportunity to create a link between two or more interrelated cubes in order to compare their data. For example, this applies if you calculate revenues and discounts an expression involving measures from two cubes (Figure 1-29). Figure 1-30 shows an example in which a sales cube is drilled-across a promotions cube in order to compare per quarter and product category.
Most OLAP tools can perform drill-through operations, though with varying effectiveness. This operation switches from multidimensional aggregate data in data marts to operational data in sources or in the reconciled layer.
In many applications, an intermediate approach between static reporting and OLAP is broadly used. This intermediate approach is called semi-static reporting. Even if a semi-static report focuses on a group of information previously set, it gives users some margin of freedom. Thanks to this margin, users can follow a limited set of navigation paths. For example, this applies when you can roll up just to a few hierarchy attributes. This solution is common, because it provides some unquestionable advantages. First, users need less skill to use data models and analysis tools than they need for OLAP. Second, this avoids the risk that occurs in OLAP of achieving inconsistent analysis results or incorrect ones because of any misuse of aggregation operators. Third, if you pose constraints on the analyses allowed, you will prevent users from unwillingly slowing down your system whenever they formulate demanding queries.
Dashboards are another method used for displaying information stored to a data warehouse. The term dashboard refers to a GUI that displays a limited amount of relevant data in a brief and easy-to-read format. Dashboards can provide a real-time overview of the trends for a specific phenomenon or for many phenomena that are strictly connected with each other. The term is a visual metaphor: the group of indicators in the GUI are displayed like a car dashboard. Dashboards are often used by senior managers who need a quick way to view information. However, to conduct and display very complex analyses of phenomena, dashboards must be matched with analysis tools.
Today, most software vendors offer dashboards for report creation and display. Figure 1-31 shows a dashboard created with MicroStrategy Dynamic Enterprise. The literature related to dashboard graphic design has also proven to be very rich, in particular in the scope of enterprises (Few, 2006).
Keep in mind, however, that dashboards are nothing but performance indicators behind GUIs. Their effectiveness is due to a careful selection of the relevant measures, while using data warehouse information quality standards. For this reason, dashboards should be viewed as a sophisticated effective add-on to data warehouse systems, but not as the primary goal of data warehouse systems. In fact, the primary goal of data warehouse systems should always be to properly define a process to transform data into information.
More on data warehouse design:
- Continue to the next section: Similarities and differences between ROLAP, MOLAP and HOLAP
- Download a free PDF of this chapter: "An Introduction to Data Warehousing"
- To purchase the book or similar titles, visit McGraw-Hill
- Read other excerpts from data management books in the Chapter Download Library.