This is the second of a two-part series examining data management trends and market drivers for the second half of 2009 and what those trends will mean for the future of data warehousing. The recommendations address what IT directors, managers and executives should do about the trends.
Open source data warehouses
Open source has reached a critical mass, with enough product offerings to constitute a complete architecture – data warehousing as simple as front end, middle, and back end. Going forward, open source data warehouse software offerings will be developed, enhanced and enriched in functionality. At the back end, MySQL is used by Infobright and KickFire. ParAccel, Netezza and Greenplum use a version of PostgreSQL. DatAllegro (now a part of Microsoft) uses Open Ingres. ("Version" is the key term here, since, in some cases, vendors have enhanced the code, rendering it proprietary.) The architecture is rounded out with Pentaho and Talend for data integration and ETL in the middle layer, and Jaspersoft and Pentaho at the front end for information access and delivery. Thus, open source is a significant innovation, disrupting traditional software development, markets and pricing.
Master data management and data warehousing
Master data management has come into its own as a trend that drives data warehousing. Sometimes data warehousing is the tail and MDM is the dog, and sometimes the reverse is the case. From the data warehouse perspective, master data dimensions are distinguished from elementary facts – which customer buys which product and when and where. The "fact" is the intersection of the basic star schema, which is usually a quantitative data point about sales, shipments or related business metrics. From the master data perspective, customers, products, markets and other aspects of corporate memory are transactional processes in themselves that are not reducible to business intelligence but inform and give structure to it. The nice thing about MDM is that the business understands customers, products and markets and so can relate to the technology, at least at a high level. Thus, IT can use MDM to build a bridge to the business and take advantage of opportunities to partner and add value. Often, master data has to be cleaned up and governance made explicit so that it can contribute to the structuring and successful operation of the data warehouse.
The economics of data warehousing The changing economics of data warehousing is a macro-trend that encompasses several more granular drivers. Developments such as open source, the ever-improving performance of commodity hardware and storage technology, virtualization of the data center, autonomic computing, workload balancing, and software improvements in parallel processing all add up to dramatic improvements in the price performance of data warehouses. Column-oriented analytic databases and data warehousing appliances also belong on this list. For example, the Transaction Processing Performance Council benchmarks show an order of magnitude improvement in price/performance over the past period as benchmarks leapfrog one another. By any interpretation, an order of magnitude is a dramatic event that deserves attention. This means one thing for end-user enterprises – more data warehousing for the dollar.
Advanced data warehousing recommendations
Exploit open source data warehousing to simplify operations and reduce costs. Open source offers new opportunities. Open source presents a new model of software development, licensing and pricing for data warehousing. A word of caution is required, however. Expect to purchase a support agreement that provides service on weekends and evenings for any enterprise application in the unlikely event of an unscheduled system event ("software outage").
Use data warehousing master data to see the forest for the trees. MDM and data warehousing team up to grow top-line revenue. How? In finance and insurance, clients are often hidden behind a diversity of unrelated accounts. Master data performs a giant merge match operation in order to find the common individuals (and then households) shared by multiple accounts. The payoff comes when the enterprise is able to cross-sell and up-sell a market basket of diverse products to those who do not have the complete portfolio of related products.
Use data warehousing master data to substitute information for inventory. For those firms in retail and consumer packaged goods (CPG) that move physical inventory through a supply chain, inexpensive information can be substituted for expensive inventory. Using the data warehouse to implement demand planning (and so reduce inventory) requires having a consistent, unified representation of the product master data dimension.
Get more data warehousing for the dollar. Exploit the changing economics of data warehousing. Confront (and reduce) costs by means of open source, autonomic computing, virtualization and intelligent workload balancing. Publicly audited data warehousing benchmarks are showing an order of magnitude improvement in price/performance -- for example, ParAccel's 30 terabyte (TB) Transaction Processing Performance Council. Competition will continue to be intense at the high end of the data warehousing market, catalyzed by trends in column-oriented databases, appliances and open source. The changing economics of data warehousing will accrue to the advantage of end-user enterprises that can take action in the current buyer's market.
About the Author
Lou Agosta, Ph.D., is an independent industry analyst specializing in data warehousing, data quality, data mining, and business intelligence. Key word: data. His book, The Essential Guide to Data Warehousing, is published by Prentice Hall PTR. Lou can be reached at LAgosta@acm.org.