When Amazon Web Services (AWS) announced its Redshift data warehouse technology in late 2012, the cloud computing vendor did more than just introduce a new product offering -- it also opened the doors to the cloud data warehouse market as a whole. Before the debut of Amazon Redshift, data warehousing was essentially an on-premises initiative, with data migration and security issues playing big roles in keeping warehoused stores of corporate information inside the walls of organizations. Redshift made the idea of deploying a data warehouse in the cloud viable, with at least the promise of substantial cost savings compared with installing and running traditional data warehouse systems.
By March 2014, Redshift had gained AWS a place among the vendors ranked in Gartner Inc.'s annual Magic Quadrant report on data warehouse platforms and analytical databases. Since then, it also has been joined by a variety of cloud-based data warehouse competitors, from startups and major vendors alike. The data warehouse as a service options now available to prospective users include IBM's dashDB and Microsoft's Azure SQL Data Warehouse, as well as technologies from smaller companies such as Snowflake Computing and CoolaData. In addition, Oracle, Teradata and other data management vendors offer cloud-based versions of their data warehouse database platforms.
In addition to the potential savings, cloud data warehousing can enable user organizations to quickly get systems up and running. Cloud services can also be easily scaled up or down as data and business needs change. But fundamental data management processes -- data integration, data quality, data governance, master data management -- still need to be applied to information that's warehoused in the cloud. And moving on-premises data to the cloud remains an issue: In a podcast Q&A with SearchDataManagement, Rick Sherman, founder of consultancy Athena IT Solutions, said it can be cost-prohibitive and take more work than is worthwhile, "especially if you have an extensive number of data sources and … a large volume [of data]." As a result, cloud data warehouse users often are organizations that run business applications in the cloud, generating much or all of their data there to begin with.
In this Essential Guide, you'll find more information about Amazon Redshift and other cloud platforms for data warehousing, plus expert insight and real-world advice on managing cloud data warehouses. It also includes articles on general data warehouse best practices and trends.
1The state of data warehousing-
Data warehouse trends, analysis and advice
Hadoop data lakes and other big data systems capture a lot of attention and headlines these days, but data warehouses still have their place in most organizations, for supporting analysis of both current and historical data. Moving them to the cloud changes the warehousing game somewhat, adding the usual concerns about housing data off premises, but potentially reducing costs as well as the complexity that often surrounds launching and maintaining data warehouses. The articles in this section provide expert insights on the continuing need for data warehousing and the potential advantages -- and risks -- of cloud data warehouse platforms.
Consultant Craig S. Mullins details the basics of data warehouse platforms, including cloud-based warehousing services, and explains why organizations of all sizes are still utilizing them. Continue Reading
Cloud services offer a variety of data warehousing benefits, including cost savings -- but you should also be aware of the potential risks, cautions consultant David Loshin. Continue Reading
IT recruiter Matt Mueller says data warehousing is still a good career path to pursue, but adds that warehousing-related jobs at many companies may increasingly take on big data aspects as well. Continue Reading
It turns out that traditional data warehouses are playing a big role in many big data and advanced analytics initiatives, thanks in part to cloud support and other new features added by vendors. Continue Reading
In a podcast Q&A, consultant Rick Sherman discusses the maturity of cloud-based data warehouses and big data systems, and shares his thoughts on suitable uses for those technologies. Continue Reading
IT and business intelligence managers at a TDWI conference offer advice on adopting Agile software development methodologies as part of data warehouse and BI projects. Continue Reading
2Keeping up with Redshift-
Amazon Redshift developments and deployments
Amazon Web Services' data warehouse platform established the cloud as a viable option for organizations that wanted a faster and potentially more cost-effective way to deploy data warehouses. Introduced in 2012, Redshift also supported fast query performance compared with earlier cloud data warehouse offerings, thanks to its use of columnar data storage. But there's more to successfully using the AWS software than setting up a data warehouse and starting to run queries. Get advice on deploying and managing Amazon Redshift with the tip articles and news stories in this section.
Get tips on using the Python language to set up user-defined functions in Amazon Redshift -- a step that can help reduce the need to manipulate data, minimizing data fragmentation in the process. Continue Reading
Support for user-defined functions had been on the wish list of Redshift users almost as long as the cloud data warehouse platform had been available -- and AWS finally delivered. Continue Reading
Amazon Redshift user XO Group Inc. tapped third-party software to capture data and transfer it into the cloud-based data warehouse -- a key to making the deployment work, according to an XO exec. Continue Reading
Redshift's ability to run queries quickly is just one reason why the platform redefined what's possible with data warehouses in the cloud, says consultant David Linthicum. Continue Reading
Consultant Dan Sullivan lists four questions to ask internally if your organization is deciding between Amazon Redshift and the company's Relational Database Service technology. Continue Reading
Etix already had an Oracle production database -- but a Redshift data warehouse proved to be more cost-effective for the ticketing company than an on-premises Oracle option. Continue Reading
Early interest among large companies in Amazon's cloud-based data warehouse service was higher than initially expected, according to Forrester Research analyst Noel Yuhanna. Continue Reading
3Other options in the cloud-
Alternatives to AWS: Microsoft, IBM and more
Amazon Web Services isn't the only vendor to consider when evaluating a potential cloud data warehouse deployment. Microsoft is continuing its push into the cloud platform market with Azure SQL Data Warehouse, which is based on a massively parallel processing version of the company's SQL Server database. IBM offers dashDB, a cloud-based data warehouse system built around technologies from its DB2 and Netezza software. Smaller vendors have also entered the cloud data warehousing fray, and on-premises data warehouse software from various vendors is becoming available in the cloud as well. Learn more about rivals to Redshift in this section.
In this Ask the Expert item, consultant Dan Sullivan compares Amazon Redshift and Microsoft's Azure SQL Data Warehouse and details some of the key features in the latter technology. Continue Reading
IBM's dashDB cloud data warehouse service now supports MPP and the increasingly popular R analytical programming language, additions aimed at boosting query speeds and simplifying analytics work. Continue Reading
Cloud-oriented companies like Netflix and game developer Kixeye are also warehousing and analyzing data in the cloud, the latter with data warehouse software from startup Snowflake Computing. Continue Reading
Microsoft began a "limited public preview" of Azure SQL Data Warehouse, and consultant and early user Denny Cherry shares his insights on some highlights and growing pains of the new technology. Continue Reading
Data warehouse vendor Teradata has made its namesake database software available on the Amazon Web Services cloud, a move driven by growing user interest in cloud-based data warehousing options. Continue Reading
As Microsoft tries to catch up to AWS in the public cloud market, two of the new technologies it's counting on to help close the gap are Azure SQL Data Warehouse and its SQL Server 2016 database. Continue Reading
Cloud data warehouse terminology
Read definitions of technologies and processes often associated with data warehousing, in the cloud as well as on premises.
- data warehouse
- data warehouse as a service (DWaaS)
- data warehouse appliance
- Amazon Web Services (AWS)
- Amazon Redshift
- analytic database
Test your Amazon Redshift knowledge
How much do you know about Amazon's cloud data warehouse offering? Take this brief quiz to find out.Take This Quiz