Don't look now, but your data warehouse vendor may not be telling you the whole truth when it comes to the appliance...
While data warehouse appliances do minimize the amount of hardware configuration needed, they don't "automatically" fix other problems associated with your data warehouse design and performance, according to Donald Feinberg, an analyst with Stamford, Conn.-based Gartner.
But that's the message, or myth, many data warehouse vendors – including Oracle, IBM, Netezza and others – are pushing, Feinberg said.
"They will make a claim to somebody that if you've got problems in the performance of your data warehouse – meaning you're not meeting service-level agreements (SLAs) with your end users -- if you put in an appliance, it will fix your problems," he said. "Quite honestly, that's not true."
Feinberg, who will deliver one of the keynote addresses at next month's Gartner BI Summit in Las Vegas, said a traditional data warehouse with incorrect summary and aggregate data, or one lacking enough cubes for analytics and data mining, can't be fixed simply by transferring its content and design to an appliance.
"If I've got a very poorly designed data warehouse and I move that poorly designed data warehouse to an appliance, there is nothing that the appliance makes automatically work better," Feinberg said.
In other words, there's no quick fix for a poorly designed data warehouse that begins to show strains under the increased workload pressure. And workloads are indeed on the rise and have been for several years.
Data warehouses were originally designed as reporting tools with data integrated in batch form, often just once a week, Feinberg said. Over the years, however, front-end business intelligence tools have matured and demands for more frequent and real-time data integration have grown. As a result, data warehouse performance has suffered.
"We've gone from a situation where we're running a few reports to a situation where we're running thousands of reports, and the number of people doing sophisticated queries or ad hoc queries has gone up in huge numbers," he said. "Users are starting to see performance issues," most notably queries taking longer and longer to complete.
Just throwing more hardware at the problem in the form of a data warehouse appliance doesn't change that. It might alleviate performance problems temporarily, Feinberg said, but they will return with a vengeance when the number of users increases -- as it surely will.
A better way to overcome these performance problems is to prioritize data warehouse use, he said. High-priority reports and queries should be given preference over less important ones.
"It's not just who can do my query the fastest or load my data the fastest … but who can give me my complete workload with the least investment," he said. "And that's where work load management comes in."
Unfortunately, there is only one vendor – Teradata -- with a truly comprehensive workload management offering, according to Feinberg. Competitors like Oracle, IBM and Microsoft are starting to catch up, however, and they could have offerings comparable to Teradata's on the market within a couple of years, he said.
"The gap in workload management is closing," he said.
Another option is to outsource data warehouse management to a vendor like Kognitio or 1010data. Vendors such as these host customers' data warehouses in their own data centers, which are designed for optimal data warehouse performance.
This is a good option for companies that lack IT resources to host and manage their data warehouse internally or that want to free up IT to work on more pressing problems, Feinberg said. But in addition to improved data warehouse performance with little internal effort, there are drawbacks.
Costs can quickly spiral out of control, for instance. With an on-premise data warehouse, IT is in control of how many users may access the system and for how long. But in a hosted environment, he said, "one person can blow the whole monthly budget in two days."