Among the early users of Google’s new Datastream change data capture service is grocery store chain Schnuck Markets, which operates 111 stores across Missouri, Illinois, Indiana and Wisconsin.

In a breakout session at the tech giant’s Data Cloud Summit on May 26, Caleb Carr, principal technologist at Schnuck Markets, detailed the St. Louis, Mo.-based company’s use case for Datastream, which the tech giant introduced at the virtual conference.

Carr noted that Schnuck has been using Google Cloud Platform for several years, for cloud storage as well as the BigQuery data warehouse. One of the challenges his team faced was ensuring that Schnuck's operational data from its on-premises Oracle database environment was available quickly and reliably for analytics workloads running in BigQuery.

Carr's team’s initial approach was a batch job that synchronized data at periodic intervals.

"The batch nature of the process caused delays in replication and we weren't making decisions at the kind of speed we wanted," Carr said.

The batch approach also had an impact of Schnuck's network as large volumes of data were being synchronized at different times. The company also needed to maintain dedicated staff managing the process.

Getting data from Oracle to Google With Datastream, Carr said getting up and running was straightforward. The first step was to enable Oracle's LogMiner, which provides Oracle database table-logging functionality. Datastream was then able to pick up the changes from the Oracle database, via a Bastion host into a Google Virtual Private Cloud instance. The Bastion host provides a secured entry point into the Google Cloud. "One of the clear values of Datastream is the real-time access to data in BigQuery," Carr said. "For us, that means our data science team is working with up-to-date data for our machine learning models, and our programs are running with faster business insights and our stores’ teammates can better support our customers." In addition to Datastream, Google introduced a series of new cloud data initiatives during the inaugural Data Cloud Summit. The new Analytics Hub and Dataplex systems are aimed at boosting data and analytics capabilities in the Google Cloud. All three services are now available in preview. The Analytics Hub is a central location to share analytics, while Dataplex provides a data fabric that can help organizations curate data for analysis across the cloud.

Google Datastream change data capture in the real world Meanwhile, the Datastream change data capture (CDC) service will help bring in data from multiple sources into Google's cloud data services including the BigQuery data warehouse, Cloud SQL database and Cloud Spanner distributed SQL database. While the name Datastream might appear to imply some form of streaming data service like Apache Kafka or Amazon Kinesis, that's not the case. IDC analyst Stewart Bond emphasized that Datastream is first and foremost, a CDC technology. Bond explained that CDC provides the ability to monitor source database log files for changes to data, and then capture and forward changes to a target for processing. "Log-based change data capture is a method that has been used for many years to capture changes to data in databases in a non-invasive manner," Bond said. "It means there is no query impact on the source database, no stored procedures or triggers to write, and no shadow tables to manage."