Data warehouse appliances come in many different incarnations, and wading through the myriad vendor and analyst definitions of data warehouse appliances can be a highly confusing task. But in the end, arriving at a definition for business intelligence (BI) technology isn't nearly as important as determining what data warehouse appliances can do for your organization, according to Michael A. Schiff, founder and principal of MAS Strategies, a BI consultancy, and Kim Stanick, vice president of marketing with ParAccel Inc., a San Diego, Calif.-based analytic database vendor. They serve as instructors with The Data Warehousing Institute (TDWI) and taught a course on 'Demystifying data warehouse appliances' at a recent TDWI conference.
In this interview, Schiff and Stanick arrive at a working definition of data warehouse appliance, explain the benefits and pitfalls of data warehouse appliances, and offer advice as to what potential buyers should keep in mind when looking at the many different data warehouse appliance offerings.
The title of your recent session at the TDWI conference was 'Demystifying Data Warehouse Appliances.' Why do data warehouse appliances need to be demystified?
Michael A. Schiff: There are just so many definitions floating around. "Appliance" has become a term that you see a lot, and there is a lot of hype associated with it despite the fact that it's a really good technology.
Kim Stanick: [We had a slide that summarized what all the various analysts have said about data warehouse appliances] and the various types of delineations that have come up. For instance, there is a native or complete stack kind of scenario, there is a packaged scenario, a software or virtual scenario, and an offload or accelerator scenario. Some analysts said that it had to have built-in redundancy to be an appliance, others said it had to be sold by data volume to be an appliance. And so we came up with a working definition for our course so that we could have a semantic conversation without having to define every sentence.
What definition did you folks come up with for the course?
Schiff: A combination of integrated hardware and software designed specifically for analytical processing. But the bottom line is this: Ask not what a data warehouse appliance is in strict terms; rather, ask what it can do for your organization.
What are some of the hardware considerations that come into play when considering a data warehouse appliance?
Stanick: There are known things about hardware that lend themselves to analytic processing versus [things that don't.] For instance, let's take [IBM's Balance Warehouse]. Basically, IBM is saying, "Look, you shouldn't just throw in any particular model with any particular combination. You shouldn't just buy any old favorite server that you like. You need to pay attention to the components and the makeup of that server so that it's balanced for analytic processing." In other words, you're not just taking any old hardware and any old software and then doing some tuning with it. You're balancing to make sure that the sweet spots of those two things are balanced, are meant for each other and are good for analytic purposes.
Can you explain the value proposition associated with data warehouse appliances?
Stanick: The promise of data warehouse appliances is that they should be able to simplify the physical database design layer and the activities that make sure the software is tuned for the hardware. The whole premise of appliances is that there are certain classes of processes or workloads that are a large part of what people are trying to do when they do analytic processing. Obviously, there are outliers or unique scenarios, but those [processes or workloads] are known well enough to be able to marry a hardware and software layer together so that you don't have to worry about the integration anymore. It frees IT from that burden. You essentially offload that burden to a vendor and pay them for some support.
Am I correct in assuming that the key difference between data warehouse appliances and data marts is that data marts are much more focused in nature?
Schiff: Yes, exactly. And you can have huge data marts, [but] appliances are evolving to the point where initially they might have been single purpose, but now we're running into situations where they're being used for multiple purposes.
Is there an easy way for potential buyers to differentiate between a data mart and a data warehouse appliance?
Stanick: Data warehouse appliances [are] really keyed to analytic types of queries. So what's different here [are] things like full table scans, joins across tables and sorts and aggregations. Those are things that you typically had to spend a lot of time on in a traditional database environment, tuning and putting structures around the data to get the database to perform well for large sets of users. That's the kind of work that you're trying not to have to do anymore.
Schiff: There are exceptions, but for the most part [data warehouse appliances] use massive parallelism.
What are some of the benefits of data warehouse appliances?
Schiff: For the most part, it has been proven that in the right situations they're quick enough to implement and they're low-cost enough that you might solve problems that you haven't thought of solving before.
Stanick: They also take the guesswork out of the acquisition process of the platform environment. Before data warehouse appliances, you had to decide what software you were going to get, what hardware you were going to get, and how much hardware you needed to get for what kind of performance level. It was much more guesswork; whereas now, [data warehouse appliances] basically unitize the data volumes or performance levels and unitize the approach so that you can understand pricing increments and you can have good faith that what you're buying is intended to work together. Before, the burden of that was on you and the process of evaluation, and benchmarking was longer because you may have been combining many different hardware vendors with many different software vendors and trying to see what worked best.
What are some of the pitfalls of data warehouse appliances?
Stanick: [There can be problems] if you apply a data warehouse appliance in the wrong environment. For example, you know it's not really a data warehousing workload or you're trying to do things that are pushing the envelope a little bit from the sweet spot of that particular product.
One of the [other] risks that we pointed out is the fact that a lot of these vendors are fairly new, and so you have that whole new-vendor risk potential, and each shop has to decide what their appetite is for new technology. Are they really an early adopter environment [and] willing to put up with a little bit of risk and some bumps to get something that [will be very extreme if it works] and offer better price performance? Or do they have more of a traditional environment where they really need to go down the path of [taking] this in baby steps?
What other advice do you have for firms that are mulling the purchase of a data warehouse appliance?
Stanick: Picking the technology that is best for what you're trying to do is very important. These are not one-size-fits-all, cookie-cutter types of offerings.
Schiff: The other thing to keep in mind from a business perspective is that these things can often blow up like mushrooms. It's because sometimes they're so successful and word spreads. [And remember that] data warehouse appliances don't necessarily replace a data warehouse; rather, they're part of the overall data warehousing architecture. There are situations where you might actually want to offload stuff from a data warehouse into a data warehouse appliance to query a subset of the data that's in the warehouse.