This content is part of the Buyer's Guide: A buyer's guide to selecting the best data warehouse product

Pivotal Greenplum streamlines big data query optimization

The Pivotal Greenplum open source shared-nothing data warehouse delivers high query performance and throughput, and provides rapid analytics on big data.

The Pivotal Greenplum open source, massively parallel data warehouse delivers business analytics on massive volumes of data collected for machine learning and advanced data science applications.

The platform combines relational and columnar capabilities, and can be deployed on-premises as software, an appliance or a virtualized service. Pivotal Greenplum's advanced query optimization capabilities, as well as tight integration with popular analytical libraries and software stacks, help organizations develop high-performance applications.

Pivotal Greenplum features and highlights

The platform is built around a shared-nothing database management system that automates parallel processing of data and queries. This architecture is mainly used by large enterprises and government institutions to enable petabyte-scale, parallel loading to quickly populate a data warehouse. It is ideal for machine learning and advanced analytics.

The platform also provides polymorphic data storage to optimize efficiency and deliver high-performance compression technology to reduce the storage footprint of large data warehouses.

Pivotal Greenplum handles complex queries using its built-in, cost-based Query Optimizer, which creates query plans that execute optimally on large volumes of data. The advanced analytics platform enables users to run high-volume, interactive batch jobs, with low latency and high throughput. The analytics framework is extensible for creating customized analytics and database functionality.

Data integrity and availability are built into the fabric of Pivotal Greenplum, with in-depth capabilities for backup, recovery and availability. It provides business continuity features, such as high availability, intelligent fault detection and fast online differential recovery, full and incremental backup, and disaster recovery. Servers can be added while the database is online and fully available.

Pivotal Greenplum also offers a rich set of security and authentication features that address enterprise policy and regulatory requirements.

For organizations that want to integrate big data projects with their data warehousing efforts, Pivotal Greenplum integrates with various big data environments, including Hadoop, in-memory data grid and object store.

IT can manage the system using a single unified framework for monitoring, administration and workload management. The performance-monitoring framework can be used for reviewing and managing both hardware and software issues.

Pivotal Greenplum runs on a variety of Linux distributions and versions. The current version, 4.3.6, runs on the following platforms:

  • Red Hat Enterprise Linux 64-bit 6.x, 5.x;
  • SUSE Linux Enterprise Server 64-bit 10 SP4, 11 SP1, 11 SP2;
  • Oracle Unbreakable Linux 64-bit 5.5; and
  • CentOS 64-bit 6.x, 5.x.

Licensing, pricing and support

Pivotal Greenplum is commercially available as part of the Pivotal Big Data Suite (BDS), and supports multiple deployment and distribution models:

  • Software. Packaged software distribution for integration, with user-provided commodity hardware running Linux.
  • Appliance. EMC Data Computing Appliance -- fully integrated hardware and software service, available ranging from quarter rack with four nodes to hundreds of nodes.
  • Virtualized infrastructure as a service. Deployed as a virtualized compute and storage environment.

The Pivotal Big Data Suite is a flexible, subscription-based pricing model. Pivotal BDS provides users, developers and data scientists with a complete suite of data management and analytical tools.

The product is marketed and sold by Pivotal and its partners. Support is included as part of the subscription price.

There is also an open source option, Greenplum Database, licensed under Apache License, Version 2.0. To be part of the Greenplum Database open source community, download the source code and register on the mailing list. All open source communications and contributions are managed through this public email exchange.

Next Steps

Merging big data and traditional data warehouses presents unique opportunities

Data warehouses continue to be relevant in the enterprise

Learn which data warehouse deployment option is right for your organization

Dig Deeper on Data warehouse appliance technology