This content is part of the Buyer's Guide: A buyer's guide to selecting the best data warehouse product
Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

The benefits of deploying a data warehouse platform

Big data may be all the rage, but data warehouse platforms are still being utilized by companies of all sizes. Expert Craig S. Mullins takes a look at the technology.

Data warehousing has been an important part of the data architecture and IT infrastructure of many organizations for almost 30 years. But despite its heritage, confusion has besieged the data warehouse in recent years.

With the advent of data lakes, big data and advanced analytics, some within the IT industry have questioned whether the data warehouse is still relevant. The short answer is: absolutely. The longer answer is addressed in this four-part series, which details what you need to know before buying a data warehouse platform.

Despite all the hype surrounding big data and analytics, companies in all industries and of all but the smallest size are using data warehouses to deliver actionable BI that executives and managers can use to make important decisions. Data warehouse platforms provide business people with a window into their organization's historical activities -- both successes and failures. While big data and analytics do have their place, data warehousing stands out as a practical, proven practice for analyzing structured business data in order to support data-driven decision making.

What is a data warehouse platform?

A data warehouse is best defined by the type of data it stores, and the people who use it. Designed for decision support and BI activities, the data warehouse is separated from the day-to-day online transaction processing (OLTP) applications that drive the core business, thereby reducing contention for both operational transactions and analytical queries.

A data warehouse is typically read-only, with the data organized according to business requirements, rather than by computer processes. The data warehouse classifies information by subjects of interest to business analysts and managers -- for example, customers, products and accounts. The data is inserted (or loaded) into the warehouse, then made available for querying by business users.

The information stored in a data warehouse is historical, spanning transactions that have occurred over time. For this reason, warehoused data is often summarized or aggregated to make it easier to scan, access and query. Redundant data is often included in a data warehouse in order to provide users with multiple views of information that present it in logical, easily understood groupings.

What type of data resides in a
data warehouse?

A data warehouse is a collection of data that's:

  • Separate from operational systems
  • Accessible and available for queries
  • Subject-oriented by business
  • Integrated and consistently named and defined
  • Associated with defined periods of time
  • Static (non-volatile); meaning that updates aren't made

Data warehouses contain information that has been culled from operational systems, as well as possibly external data such as third-party point-of-sale information. Data in the data warehouse is consolidated and stored in a consistent form for the enterprise, even when the operational data is formatted, stored and maintained in many different ways. This permits business analysts to examine information in the data warehouse platform without having to transform the data or question its integrity.

For example, a typical query submitted to a data warehouse might be: "What was the total revenue produced for the central region for product 'x' during the first quarter of last year?"

Although data doesn't usually change once it's stored in the data warehouse, new data will be loaded periodically. How frequently the data gets added is based on the latency requirements of the BI applications and decision support systems that use the data warehouse. Many modern data warehouses are near-real-time, meaning the latency is low between when data is created or changed in a production system and when the new data is moved to the data warehouse.

The most significant distinguishing feature of the data warehouse is that it enables a business to analyze its data over time, whereas operational databases support the production systems used to run the business. Operational data is atomic in nature, continually changing as updates are made as part of OLTP applications, and reflecting only the current value of the last transaction.

How companies use data warehouses

The data warehouse supports online analytical processing (OLAP), which enables high-level end users to gain insight into business operations through interactive and iterative access to the stored data. This enables business executives to improve corporate strategies and operational decision making by querying the data warehouse to examine business processes, performance and trends.

For example, a data warehouse can be used to perform the following tasks:

  • Track, manage and improve corporate performance.
  • Monitor and modify a marketing campaign.
  • Review and optimize logistics and operations.
  • Increase the efficiency and effectiveness of product management and development.
  • Query, join and access disparate information culled from multiple sources.
  • Manage and enhance customer relationships.
  • Forecast future growth, needs and deliverables.
  • Cleanse and improve the quality of your organization's data.

There are many benefits to deploying and effectively using a data warehouse. From an IT perspective, separating the analytical processes in a data warehouse from the operational processes in the production applications and transactions can enhance the performance of both areas. From a business perspective, a data warehouse platform can deliver a practical way to view the past without affecting the daily operations of the business. By querying and analyzing data in the data warehouse, organizations can improve operations and enable more efficient business processes, thereby increasing revenue and raising profits.

Many types of software products can be categorized as data warehouse tools, including database management systems (DBMSes), database appliances, OLAP and reporting tools, BI tools, dashboards, extract, transform and load tools and other types of data integration software, and software repositories. All of these tools are used collectively to create, deploy and manage the data warehouse environment.

Types of data warehouse platforms

The most common type of data warehouse platform is a DBMS. In most cases, a data warehouse DBMS is relational, but any type of DBMS can be used. In many cases, a data warehouse DBMS has additional features and characteristics that enable it to be used effectively for data warehousing activities and operations. A data warehouse platform might also be bundled with additional software, such as the technologies noted above. Of course, bundled services can affect cost while improving manageability (because integrated features can be easier to use).

The following four types of data warehouse platforms will be examined in this series, as well as the market leaders in each category:

  1. Traditional DBMSes, mostly relational but not exclusively, from vendors including IBM, Microsoft, Oracle and SAP.
  2. Specialized analytics DBMSes. Data warehouse platforms that fall into this category aren't traditional DBMSes; they've been augmented in some significant way to support data warehouse workloads. One example of a database-specific data warehouse product is a columnar DBMS like those offered from HP or SAP.
  3. Appliances designed to provide turnkey data warehousing services right out of the box. These typically combine software and hardware as a package with a relational or analytics DBMS pre-installed and configured on the requisite hardware. The server is set up with the proper amount of memory and storage, and the DBMS is installed by the vendor; all the customer needs to do is plug it in and turn it on. (Well, that's what the vendors say, at any rate.) Examples here include products from Teradata, IBM and Oracle.
  4. Cloud data warehouse offerings, where the DBMS isn't installed on premises but is accessed over the Internet. A cloud offering delivers a data warehouse as a service, without requiring any specific DBMS or hardware at the client site. Microsoft, Amazon and IBM are market leaders in the cloud data warehouse platform space.

The vendor landscape for data warehouse platforms is somewhat varied. It's dominated by the traditional DBMS providers, but there are data warehousing-specific vendors, specialized analytics DBMSes and cloud providers that are worthy of consideration for certain types of buyers. Additionally, there are a growing number of hybrid solutions that integrate multiple data storage and access capabilities for structured and unstructured data, such as those provided by Actian and Pivotal, as well as others.

Who implements, manages and uses data warehouse platforms?

The decision to purchase and implement a data warehouse is usually made by a line-of-business executive or a high-level IT executive in collaboration with lines of business. The project to build a data warehouse comprises multiple individuals, including business subject-matter experts, data architects, database administrators, coders and SQL query experts, along with project managers.

The ongoing management of the data warehouse typically falls to the database administration unit, perhaps augmented by data architects and performance analysts. But it's the line-of-business personnel who are the end users of the data warehouse.

Now that you understand what a data warehouse is, the next step is to determine your organization's needs and objectives by taking a closer look at some typical use cases for which these tools are optimized.

Next Steps

Find out why big data vendors shouldn't bash data warehouses

Learn how to determine when purchasing a relational DBMS makes sense

Listen to an expert discuss cloud data warehouses

Learn how the Charlotte Hornets built a data warehouse using Phizzle

Dig Deeper on Data warehouse software

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Which data warehouse platform technologies is your organization considering: a relational DBMS, a database-specific data warehouse product, an appliance or a cloud offering?
Nice information presented in very easy language, helping understand the basics of DW. Thanks.
Good coverage of the overall space, one note is that the list you have of cloud data warehousing market leaders surprised me a bit. Microsoft has announced plans for a future offering, but does not yet have a product on the market, so seems a bit early to crown them a market leader. IBM has a recent entry (dashDB) on the market that is still in early stages. There are also a small number of startups in the space that have interesting offerings. Amazon is a clear leader here, but the future looks interesting!
Interesting article. Your readers might also find real user reviews for all the major Data Warehouse solutions on IT Central Station to be helpful:

According to the IT Central Station user community, Oracle Exadata is the #1 Data Warehouse solution. This user writes, "We have realized a lot of tangible and intangible benefits from the machine. It would be difficult to put a dollar figure to it because the machine allows for things that simply cannot be done without it. So in that way, we can safely say that the machine paid for itself within the first 12 months." You can read the rest of his review here: