This article originally appeared on the BeyeNETWORK.
Isn’t it true that traditionally people have built data warehouses and have paid a lot of money? That certainly has been the pattern, but is that always going to be the pattern? The answer is that there is a new wind blowing and that some alternatives are starting to appear that have the potential for greatly lowering the price of building and operating the data warehouse.
Even more interesting is the fact that these price improvement opportunities appear in different places. The help in building a reasonably priced data warehouse comes in very different sectors of the data warehouse marketplace. Let’s take a look at some of these promising new technologies.
On the immediate horizon are the data warehouse appliances. One of the leading and most promising data warehouse appliances is Netezza. With Netezza, you can have lots of storage at an inexpensive price. In the same ballpark is Greenplum. Greenplum is a technology that manages the infrastructure in a parallel manner. With both Netezza and Greenplum, you have the opportunity to grow your data warehouse to an almost unlimited size.
But the most intriguing data warehouse appliance is probably Dataupia, a newcomer to the data warehouse scene. Not only does Dataupia lower the cost of storage dramatically, but Dataupia also speeds performance. However, the real advantage that Dataupia offers is transparency. In this case, transparency means that Dataupia’s product is plug compatible, or close to it. Dataupia’s storage goes in, traditional storage goes out, and the only person that knows the difference is the chief financial officer who is now paying far less than in the past. There are no conversions, no switching of software, no changing of database management systems and no alterations to queries.
The cost of storage is normally not an issue in the early stages of the building of a data warehouse. The cost of storage becomes an issue as every day and every month passes and more data accumulates in the data warehouse. Not only is more space required, but parallelism, addressability, backup and recovery all start to become issues. In the case of very large volumes of data, even indexing becomes an issue.
So when it comes to the long term costs of data warehousing, the requirement for large amounts of storage becomes a burning financial issue. Fortunately, the day has passed when the only alternative was high performance disk storage.
In another arena of the data warehouse world, there is more good news. For years, there has been a need for ETL (extract, transform and load) processing and software. The day where all the transformation programming was done by hand has long passed. In the early days, the price of ETL was reasonable. Does anyone remember when you could by an ETL package for $25,000? What has happened is that the deal size of ETL has grown over the years. Today, depending on the options you want and the number of users you have and the number of nodes you want to be enabled, ETL may cost you north of seven figures. In any case, it is well beyond the original offering price.
Enter into the marketplace Talend. Talend is designed for the mid-market. And the Talend price tag is one that everyone (except the existing ETL vendors) loves. The basic kernel of Talend is free. Talend does offer subscription services and add-on modules, but Talend can be downloaded from the Internet for free.
This opens up the marketplace to a class of customer that has heretofore been squeezed out – the mid-market. There is a whole class of customers that need a data warehouse but could not afford the pricey technology of traditional ETL. This is indeed good news for the consumers.
Another interesting product for the development of the data warehouse is RapidAce. For years, data warehouse designers have needed a support tool for the initial design of the data warehouse. For years, data warehouse designers have had to put up with whatever there was in the marketplace, whether that existing product really supported data warehousing or not.
Now we have RapidAce, which is designed specifically for designing for the data warehouse. In particular, RapidAce supports a form of data warehouse design known as the “data vault.” The best way to describe data vault design is to call data vault design a normalized, optimized design. (For more information about data vault, refer to the works by Dan Linstedt and Hans Hultgren.)
Data warehouses do not have to cost an arm and a leg. In fact, now the biggest determining factor in the design and operation of the data warehouse is the designer.
What do vendors of existing technology have to say about these advances? There is one word to describe vendors of existing technology, and that word is complacent. It is a sad statement in the world that the really innovative, the really exciting technology comes from startups. It is an interesting perspective – when is the last time that a large existing vendor has come up with a new and revolutionary idea?
Not coincidentally, the cost-saving measures for technology for data warehousing do not come from the existing large vendors. Instead, the really good ideas come from startups. Large corporations are too busy protecting their existing products and market share to try anything new and innovative.