The business intelligence (BI) and data warehouse software world is a dazzling display of new products, paradigms...
and press releases. Some are old snake oil in new bottles. Others are genuinely new and, of those, some are even useful. The problem is, as always, stepping back far enough to see the meaningful patterns, to identify those new technologies that will actually make a difference to your work in BI. I zoomed out as far as possible to identify the technology trends that are truly shaping BI. In Part 1 of this series, below, I talked to BI and data warehouse vendors about the changes occurring in the market and in their products. I also looked at a case study on the Hudson's Bay Company, which recently rolled out an updated BI system. In Part 2, I will examine the technical challenges of third-generation BI and data warehousing, discussing such issues as combining analytical and transactional queries.
Technology is worthy of your attention when it solves an existing problem. Ten years ago, the big concern was simply getting our BI systems to work. For most enterprises, this is no longer the main issue, and we are now looking at ways to simplify and streamline our functional systems, which brings us to the data warehouse appliance.
Data warehouse appliances: A panacea?
Easy to configure, already optimized and highly scalable, data warehouse appliances are not the panacea for all BI problems but, where applicable, they have already proved their mettle. Championed by companies like Framingham, Mass.-based Netezza Corp.; U.K.-based Kognitio Ltd.; and Aliso Viejo, Calif.-based DATAllegro Inc., their use of commodity massively parallel processing (MPP) and/or in-memory querying has changed the landscape forever. This is not simply because enterprises are deploying data warehouse appliances but because these new products are significantly altering mainstream BI.
That's similar to the view of Greg Battas, distinguished technologist, BI group with Hewlett-Packard.
"Data warehouse appliances are a disruptive force which addresses a problem that hasn't been addressed before. They have taught us that flexibility is the key," Battas said.
This view led to the development and release of HP's Neoview data warehouse, he said.
IBM agrees with this notion, according to Marc Andrews, program director of data warehousing for Big Blue. The success of data warehouse appliances has taught vendors that pre-configured, pre-optimized solutions are a great value proposition, Andrews said. IBM has responded to this by adding a balanced warehouse to its portfolio, essentially bringing data warehouse appliance features into a very traditional BI environment.
Why BI had to evolve
The traditional face of BI needs to change because the way in which we are using BI is changing. For a start, analysis is getting more complex, according to Ellen Rubin, vice president of marketing with data warehouse appliance vendor Netezza.
"Until recently, [our customers] have been looking for fraud after it has occurred," Rubin said. "Now we can look much further into the future, asking not just 'What patterns can we see in the data?' but 'What patterns are we likely to see in the future?' "
But it is also about adding a very different type of usage, according to IBM's Andrews.
"We would characterize BI as having three generations," he said. "The first generation was about understanding the past. The second was about analyzing why things happened and making recommendations about the future. That's better than first, but I still liken this to driving a car by looking in the rear view mirror. The new, third generation is about making information available to the people in front of the customer."
This is a truly significant shift in the way enterprises use data warehouses. First- and second-generation systems needed to support a limited number of people who ran large, complex analytical queries. The third generation must support not only more complex queries from the same analysts but also a new workload that consists of thousands of users running very different queries. These may well be complex, but each is likely to hit a relatively small set of data within the warehouse.
Combining these very different workloads is non-trivial. So is it worth the effort? The Hudson's Bay Company, based in Toronto, certainly thinks so.
Hudson's Bay Company embraces third-generation BI
Sadly, not all shoppers are honest. The National Retail Federation estimates that retailers lose about $16 billion a year to returns fraud -- dishonest customers presenting stolen items for refund or using a sales receipt multiple times. Detecting the patterns of fraudulent returns after they have occurred is second-generation territory. But catching the offender at the checkout with the receipt in his hand illustrates very clearly the difference that a third-generation system can make.
By combining a data warehouse from Dayton, Ohio-based Teradata with IBM's WebSphere as middleware and BI software from McLean, Va.-based MicroStrategy Inc., the Hudson's Bay Company was able to update the data warehouse with sale, return, exchange and void data almost instantaneously. Essentially, it is now impossible for a receipt to be reused or for merchandise to be returned fraudulently.
These third-generation systems are not pie in the sky. The Hudson's Bay Company rolled out this system across all its stores and, within five months, the savings had delivered a 100% ROI, according to Mary-Jane Jarvis-Haig, senior manager of business intelligence development and support for Hudson's Bay.
"The cost savings have been huge," Jarvis-Haig said. "We're already exceeding our targeted benefits."
About the author: Dr. Mark Whitehorn specializes in the areas of data analysis, data modeling, data warehousing and business intelligence (BI). Based in the U.K., he works as a consultant for a number of national and international companies, designing databases and BI systems. In addition to his consultancy practice, he is a well-recognized commentator on the computer world, publishing about 150,000 words a year, which appear in the form of articles, in publications such as PCW and Server Management Magazine, white papers and books. He has written nine books on database and BI technology. The first one "Inside Relational Databases" (1997) is now in its third edition and has been translated into three other languages. The most recent is about MDX (a language for manipulating multi-dimensional data structures) and was co-written with the original architect of the language – Mosha Pasumansky. Mark has also worked as an associate with QA-IQ since 2000. He developed the company's database analysis and design course as well as its data warehousing course.
Don't miss Part 2 of this series, in which Whitehorn looks at the technical challenges of third-generation BI and data warehousing, discussing such issues as combining analytical and transactional queries.