michelangelus - Fotolia

News Stay informed about the latest enterprise technology news and product updates.

Data warehouse software not ready for retirement in age of big data

Hadoop clusters, NoSQL databases and other modern technologies have roles to play in business intelligence and analytics environments. But traditional data warehouses still do, too.

The original intent of the data warehouse was to segregate analytical operations from mainframe transaction processing in order to avoid slowdowns in transaction response times, and minimize the increased CPU costs accrued by running ad hoc queries and creating and distributing reports. Over time, the enterprise data warehouse became a core component of information architectures, and it's now rare to find a mature business that doesn't employ some form of an EDW or a collection of smaller data marts to support business intelligence, reporting and analytics applications.

But as organizations increasingly adopt newer technologies -- Hadoop clusters, NoSQL, columnar and in-memory databases, data virtualization tools -- questions are being raised about the future relevance of data warehouse software in enterprise IT infrastructures. Some people have already started to ring the death knell for the EDW, predicting its impending demise at the hands of big data systems and high-performance computing platforms.

And those other technologies do offer some advantages over the traditional data warehouse. Hadoop is a distributed processing framework that promises high levels of performance scalability using low-cost commodity hardware. In-memory databases and columnar software geared to analytical uses can also dramatically increase processing performance. NoSQL databases bypass the schema strictures of mainstream relational database management systems and provide wider flexibility in developing applications. Layering a data virtualization tool on top of systems enables on-the-fly integration and in some cases also allows transaction processing and analytical applications to simultaneously touch the same data sets; both of those capabilities can reduce the need to extract and load data into a segregated warehouse.

Look under the covers on IT costs

Yet the reports of the death of the data warehouse may prove to be greatly exaggerated. From a financial perspective, the motivations to migrate to new technologies must be balanced with the merits of continuing to leverage existing investments in EDW technology that's already in production use -- and still producing the data goods. It's also useful to point out that, in order to be realized, the perception of the value of radical change sometimes requires a greater investment than originally anticipated.

As an example, consider infrastructure costs. There's an implication that downloading and installing open source software such as Hadoop on a homegrown setup of interconnected commodity computing systems provides a low-cost alternative to the high-end servers or mainframes that typically host data warehouses. While it's possible to create a test-bed environment using that approach, it takes more for a Hadoop cluster to deliver on its performance promises in production applications: An organization must invest not only in new technology but also in skilled staff resources to deploy and manage the platform.

Hadoop's potential for storage elasticity also suggests potentially unlimited disk space. But it isn't always smooth sailing on the Hadoop data lake. Realistically, the availability of a seemingly inexhaustible amount of storage may encourage users to save data unnecessarily, rapidly filling the available disk space with a broad array of unstructured (and ungoverned) data that may not have any real business value.

A blended approach to managing data

Some other key facts we should recognize:

  • Organizations that have invested significant amounts of money and effort in their data warehouse environment would need to see a sizable ROI projection for a Hadoop or NoSQL deployment before deciding to completely rip out the EDW and replace it.
  • Because of the nature of open source development, technologies like Hadoop and the various tools surrounding it still have some time to go before they reach the level of maturity that data warehouse software has attained -- if they ever get there.
  • Even though components of the Hadoop ecosystem are intended to replicate the dimensional schemas and interactive analytical queries supported by data warehouses, it remains largely batch-oriented for the near term.
  • Many business users are still dependent on the reports and ad hoc query capabilities of their trusted data warehouses.

Of course, you can't ignore the availability of a parallel processing platform that can run complex computational algorithms to analyze massive volumes of data in ways that can't be done using a system geared to dimensional slicing and dicing. The results of those kinds of analytics applications can be used to augment the data in an enterprise data warehouse, enhancing customer profiles and enabling more informed business decisions to be made.

That suggests that while Hadoop, NoSQL and other alternative technologies are likely to emerge as significant components of BI and analytics architectures, the optimal strategy will blend them with the EDW. It isn't time to close the door on the data warehouse just yet.

About the author:
David Loshin is president of Knowledge Integrity Inc., a consulting and development services company that works with clients on big data, business intelligence and data management projects. He also is the author or co-author of various books, including Using Information to Develop a Culture of Customer Centricity. Email him at loshin@knowledge-integrity.com.

Email us at editor@searchdatamanagement.com and follow us on Twitter: @sDataManagement.

Next Steps

See why consultant Wayne Eckerson says big data vendors shouldn't badmouth the EDW

Get tips from Claudia Imhoff and Colin White on building an extended BI and data warehouse architecture

Find out why consultant Rick van der Lans thinks modern data warehouses are inherently logical

This was last published in January 2015

Dig Deeper on Data warehouse software

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

4 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Is your organization combining traditional data warehousing tools and technologies with platforms like Hadoop and NoSQL databases?
Cancel

Great article and so true. HP has responded to those challenges with the Hybrid Data Management approach to combine traditional and new.


http://www8.hp.com/h20195/v2/GetDocument.aspx?docname=4AA5-3861ENW

Cancel
Another consideration is that migrating analytics to the newer platforms often does not achieve the promised benefits. The effort expended for the move drains the budget and patience and leaves nothing for improvement. Typically, the downstream applications (BI, analytics, reporting) get lost and are only gradually replaced over time. The motivation for migrating from a data warehouse to a "data lake" is cost reduction of expensive upgrades of hardware and database licenses, the soft costs of a Hadoop migration are usually overlooked.

However, vendors of data warehouse technology cannot survive of existing customers, they have to grow their businesses too. If the tide of opinion turns against them, they will end up in a death spiral. So the merits of the EDW in today's hybrid world may be meaningless.
Cancel
Hi Neil, good to hear your $0.02. Your point is well-taken. What I expect to continue to see is the existing DW vendors gradually blending their offerings to include newer technologies like Hadoop align with their existing platforms. This seems to be what Teradata is doing. Of course, as the Clouderas and Hortonworks of the world streamline and optimize their SQL offerings, the objections to Hadoop will evaporate, as this will help reduce the cost of development to get meaningful results programmatically as opposed to declaratively. I would look at a 5-7 year time horizon to see increased mainstreaming of Hadoop as the "heir" to the mid-tier appliance.
Cancel

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close