The business intelligence (BI) and data warehouse software world is a dazzling display of new products, paradigms...
and press releases. Some are old snake oil in new bottles. Others are genuinely new and, of those, some are even useful. The problem is, as always, stepping back far enough to see the meaningful patterns, to identify those new technologies that will actually make a difference to your work in BI. I zoomed out as far as possible to identify the technology trends that are truly shaping BI. In Part 1 of this series, I talked to BI and data warehouse vendors about the changes occurring in the market and in their products. I also looked at a case study on the Hudson's Bay Company, which recently rolled out an updated BI system. In Part 2, below, I will examine the technical challenges of third-generation BI and data warehousing, discussing such issues as combining analytical and transactional queries. OK, so as I discussed in Part 1, it seems that third-generation business intelligence (BI) and data warehouse systems are the future. But third-generation BI requires our systems to run both huge analytical queries and transaction-like queries -- such as those from customer-facing employees -- with equal aplomb.
This is a problem we have seen before. For the last decade or so, we solved this problem by separating the analytical and transaction systems, because combining them was too difficult. Very early on in the history of BI, we learned that analytical and transactional queries made very different demands on the query engines. So great were the differences that we made copies of the transactional data, moved it into the data warehouse and restructured it there in order to allow the analytical queries to run more effectively. Now, we are expecting our third-generation BI systems to cope with analytical and transaction-like queries in the same system.
How has business intelligence and data warehouse software changed in the interim?
Such third-generation systems have learned from the data warehouse appliances, so they can and do use massively parallel processing (MPP) and in-memory querying. But query workload management is also important. This technology carefully distributes the resources available to the system across the individual queries -- ensuring, for example, that time-critical queries, such as those from customer-facing employees, are allocated enough resources to allow them to complete in a timely fashion. Vendors such as Hewlett-Packard believe this is very important, according to Greg Battas, distinguished technologist, BI group with HP.
"HP sees workload management as absolutely vital," Battas said. "To put that into some kind of perspective, about 25% of our database development people are currently working on workload management."
The market is also seeing new hybrid systems emerging from vendors like IBM and HP with characteristics of both the traditional data warehouse and the data warehouse appliance.
The three core technologies driving this change are:
- MPP on commodity hardware
- In-memory querying
- Query workload management
These hybrid systems service not only the needs of analysts but also of a whole new layer of BI users within the company, which is remarkable. BI is finally moving from elite to egalitarian and perhaps, as analysts have long predicted, becoming pervasive throughout the enterprise.
The front end of business intelligence software -- and the audience -- changes
As third-generation systems move BI out to a much wider range of employees, it must become integrated into the software stack that those employees use on a daily basis. In some cases, this means rewriting custom software, but for many it means integrating BI into Microsoft's Office applications.
To this end, Microsoft has put massive efforts into BI, integrating Office 2007 with its own back-end BI tools. Analysis Services cubes can appear in Excel, you can data mine from within Excel or even (the mind does boggle a little here) from within Visio. However, Microsoft is simply one BI vendor; there are plenty of others out there. Somewhat surprisingly, Microsoft and other vendors, such as Teradata, are working together.
This cooperation not only allows Office components to reach data stored in Teradata, it also allows Microsoft's BI tools to do the same. Given the diversity of BI vendors, this is not a phenomenon that's going to disappear and, in this case, it is Software-as-a Service (SaaS) that is the key enabling technology. Service-oriented architecture (SOA) has been an essential technology in facilitating the cooperation, according to Ed White, director of product marketing with Teradata.
Bar charts, pie charts, yawn charts: Data visualization is changing
Pie charts date back to about 1800 and, useful as they are, we can do better, as the work of respected data visualization researchers Edward Tufte and William Cleveland shows. Several companies -- including Spotfire (now part of Tibco), QlikTech, Thinkmap, Tableau and others -- have been looking at this work and producing truly original ways of displaying complex data. I believe that this will have a profound influence on BI over the coming years, and others -- such as Roger Oberg, vice president, Spotfire product strategy with Palo Alto, Calif.-based Tibco Software Inc. -- agree with me.
"New trends like in-memory processing, 'free dimensional' ad hoc queries, and user definable workflows are democratizing BI," Oberg said. "We are moving from a world in which we push data that is often ignored to a world in which interaction massively increases the data's usefulness and therefore the number of people who want to use it."
As he points out, none of that will be effective unless people can visualize the data easily.
Yes, but what does it all mean for my business intelligence and data warehouse software?
Data is just numbers and text. One important lesson we have learned from BI is that keeping track of the meaning of the data is far, far more complex than we originally thought. In one sense, this isn't a technical problem, it's a human one -- since only humans can decide questions of meaning. However, some companies have been actively trying to address not only how we track meaning but how we track it over time, according to Cliff Longman, chief technology officer with Burlington, Mass.-based Kalido Inc.
"Fluidity of meaning is the problem that Kalido addresses," Longman said. "We find that users can get results if they are allowed to deal with a higher level of abstraction – higher than the logical model. Kalido makes data reusable even if the meaning changes over time."
And finally, though technical approaches differ, everyone I spoke to agreed on two important, related points:
1. Data volumes are growing, year on year. Two years ago, data warehouse vendor Kognitio was looking to scale its systems down to 200 GB for some customers, according to Roger Gaskell, product development director. Now, almost every proof-of-concept Kognitio does is above 5 TB, with most in the 50 to 250 TB range.
2. BI is no longer the preserve of the large enterprise; it has moved to the small/medium-sized enterprise. Ten years ago, Microsoft's vision was "BI for the masses," according to Amir Netz, product unit manager for Analysis Services -- and in recent years, other experts have often espoused the benefits of BI for small and medium-sized businesses.
That day has finally arrived, it seems, thanks to many of the technologies covered here, which, either directly or indirectly, have helped to achieve these profound changes in the world of BI and data warehousing.
About the author: Dr. Mark Whitehorn specializes in the areas of data analysis, data modeling, data warehousing and business intelligence (BI). Based in the U.K., he works as a consultant for a number of national and international companies, designing databases and BI systems. In addition to his consultancy practice, he is a well-recognized commentator on the computer world, publishing about 150,000 words a year, which appear in the form of articles, in publications such as PCW and Server Management Magazine, white papers and books. He has written nine books on database and BI technology. The first one "Inside Relational Databases" (1997) is now in its third edition and has been translated into three other languages. The most recent is about MDX (a language for manipulating multi-dimensional data structures) and was co-written with the original architect of the language – Mosha Pasumansky. Mark has also worked as an associate with QA-IQ since 2000. He developed the company's database analysis and design course as well as its data warehousing course.
Don't miss Part 1 of this series, in which Whitehorn talked to BI and data warehouse vendors about the changes occurring in their products and looked at a case study on the Hudson's Bay Company, which recently rolled out an updated BI system.