Cloud data warehouse guide: Using Redshift, rival platforms
A comprehensive collection of articles, videos and more, hand-picked by our editors
Data warehousing has been an important part of the data architecture and IT infrastructure of many organizations for almost 30 years. But despite its heritage, confusion has besieged the data warehouse in recent years.
With the advent of data lakes, big data and advanced analytics, some within the IT industry have questioned whether the data warehouse is still relevant. The short answer is: absolutely. The longer answer is addressed in this four-part series, which details what you need to know before buying a data warehouse platform.
Despite all the hype surrounding big data and analytics, companies in all industries and of all but the smallest size are using data warehouses to deliver actionable BI that executives and managers can use to make important decisions. Data warehouse platforms provide business people with a window into their organization's historical activities -- both successes and failures. While big data and analytics do have their place, data warehousing stands out as a practical, proven practice for analyzing structured business data in order to support data-driven decision making.
What is a data warehouse platform?
A data warehouse is best defined by the type of data it stores, and the people who use it. Designed for decision support and BI activities, the data warehouse is separated from the day-to-day online transaction processing (OLTP) applications that drive the core business, thereby reducing contention for both operational transactions and analytical queries.
A data warehouse is typically read-only, with the data organized according to business requirements, rather than by computer processes. The data warehouse classifies information by subjects of interest to business analysts and managers -- for example, customers, products and accounts. The data is inserted (or loaded) into the warehouse, then made available for querying by business users.
The information stored in a data warehouse is historical, spanning transactions that have occurred over time. For this reason, warehoused data is often summarized or aggregated to make it easier to scan, access and query. Redundant data is often included in a data warehouse in order to provide users with multiple views of information that present it in logical, easily understood groupings.
What type of data resides in a
A data warehouse is a collection of data that's:
- Separate from operational systems
- Accessible and available for queries
- Subject-oriented by business
- Integrated and consistently named and defined
- Associated with defined periods of time
- Static (non-volatile); meaning that updates aren't made
Data warehouses contain information that has been culled from operational systems, as well as possibly external data such as third-party point-of-sale information. Data in the data warehouse is consolidated and stored in a consistent form for the enterprise, even when the operational data is formatted, stored and maintained in many different ways. This permits business analysts to examine information in the data warehouse platform without having to transform the data or question its integrity.
For example, a typical query submitted to a data warehouse might be: "What was the total revenue produced for the central region for product 'x' during the first quarter of last year?"
Although data doesn't usually change once it's stored in the data warehouse, new data will be loaded periodically. How frequently the data gets added is based on the latency requirements of the BI applications and decision support systems that use the data warehouse. Many modern data warehouses are near-real-time, meaning the latency is low between when data is created or changed in a production system and when the new data is moved to the data warehouse.
The most significant distinguishing feature of the data warehouse is that it enables a business to analyze its data over time, whereas operational databases support the production systems used to run the business. Operational data is atomic in nature, continually changing as updates are made as part of OLTP applications, and reflecting only the current value of the last transaction.
How companies use data warehouses
The data warehouse supports online analytical processing (OLAP), which enables high-level end users to gain insight into business operations through interactive and iterative access to the stored data. This enables business executives to improve corporate strategies and operational decision making by querying the data warehouse to examine business processes, performance and trends.
For example, a data warehouse can be used to perform the following tasks:
- Track, manage and improve corporate performance.
- Monitor and modify a marketing campaign.
- Review and optimize logistics and operations.
- Increase the efficiency and effectiveness of product management and development.
- Query, join and access disparate information culled from multiple sources.
- Manage and enhance customer relationships.
- Forecast future growth, needs and deliverables.
- Cleanse and improve the quality of your organization's data.
There are many benefits to deploying and effectively using a data warehouse. From an IT perspective, separating the analytical processes in a data warehouse from the operational processes in the production applications and transactions can enhance the performance of both areas. From a business perspective, a data warehouse platform can deliver a practical way to view the past without affecting the daily operations of the business. By querying and analyzing data in the data warehouse, organizations can improve operations and enable more efficient business processes, thereby increasing revenue and raising profits.
Many types of software products can be categorized as data warehouse tools, including database management systems (DBMSes), database appliances, OLAP and reporting tools, BI tools, dashboards, extract, transform and load tools and other types of data integration software, and software repositories. All of these tools are used collectively to create, deploy and manage the data warehouse environment.
Types of data warehouse platforms
The most common type of data warehouse platform is a DBMS. In most cases, a data warehouse DBMS is relational, but any type of DBMS can be used. In many cases, a data warehouse DBMS has additional features and characteristics that enable it to be used effectively for data warehousing activities and operations. A data warehouse platform might also be bundled with additional software, such as the technologies noted above. Of course, bundled services can affect cost while improving manageability (because integrated features can be easier to use).
The following four types of data warehouse platforms will be examined in this series, as well as the market leaders in each category:
- Traditional DBMSes, mostly relational but not exclusively, from vendors including IBM, Microsoft, Oracle and SAP.
- Specialized analytics DBMSes. Data warehouse platforms that fall into this category aren't traditional DBMSes; they've been augmented in some significant way to support data warehouse workloads. One example of a database-specific data warehouse product is a columnar DBMS like those offered from HP or SAP.
- Appliances designed to provide turnkey data warehousing services right out of the box. These typically combine software and hardware as a package with a relational or analytics DBMS pre-installed and configured on the requisite hardware. The server is set up with the proper amount of memory and storage, and the DBMS is installed by the vendor; all the customer needs to do is plug it in and turn it on. (Well, that's what the vendors say, at any rate.) Examples here include products from Teradata, IBM and Oracle.
- Cloud data warehouse offerings, where the DBMS isn't installed on premises but is accessed over the Internet. A cloud offering delivers a data warehouse as a service, without requiring any specific DBMS or hardware at the client site. Microsoft, Amazon and IBM are market leaders in the cloud data warehouse platform space.
The vendor landscape for data warehouse platforms is somewhat varied. It's dominated by the traditional DBMS providers, but there are data warehousing-specific vendors, specialized analytics DBMSes and cloud providers that are worthy of consideration for certain types of buyers. Additionally, there are a growing number of hybrid solutions that integrate multiple data storage and access capabilities for structured and unstructured data, such as those provided by Actian and Pivotal, as well as others.
Who implements, manages and uses data warehouse platforms?
The decision to purchase and implement a data warehouse is usually made by a line-of-business executive or a high-level IT executive in collaboration with lines of business. The project to build a data warehouse comprises multiple individuals, including business subject-matter experts, data architects, database administrators, coders and SQL query experts, along with project managers.
The ongoing management of the data warehouse typically falls to the database administration unit, perhaps augmented by data architects and performance analysts. But it's the line-of-business personnel who are the end users of the data warehouse.
Now that you understand what a data warehouse is, the next step is to determine your organization's needs and objectives by taking a closer look at some typical use cases for which these tools are optimized -- a topic that will be covered in a separate article.
Find out why big data vendors shouldn't bash data warehouses
Learn how to determine when purchasing a relational DBMS makes sense
Listen to an expert discuss cloud data warehouses
Learn how the Charlotte Hornets built a data warehouse using Phizzle