kentoh - Fotolia
Data is a valuable asset for many companies, but trying to determine how much it actually costs to produce and maintain data is no easy task.
It's a challenge that Ascend.io in Palo Alto, Calif., is taking on with its new Ascend Govern enterprise data governance capabilities, announced this week as part of the Ascend Unified Data Engineering Platform. One of the new features in Ascend Govern is a data lineage service to manage and track where all data came from. That service is complimented with an automated cataloguing of data that also benefits from secure data feeds to connect data pipelines. Resource and cost reporting is a key component of Ascend Govern, providing organizations with insight into the financial resources required to produce and maintain data.
One of the early adopters of Ascend's new capabilities is HNI Corp., a large office furniture manufacturing company in Muscatine, Iowa. Tom Kozlowski, vice president of decision science at HNI, said his firm uses Ascend to manage data across data science and engineering teams, totaling roughly 500 million records from two data repositories.
"With Ascend, my team can easily ingest this data and quickly build multi-stage data pipelines with complex business logic," Kozlowski said. "Ascend also runs our Spark and Kubernetes infrastructure for data processing inside Azure, spinning up jobs as needed autonomously without any manual effort from my engineers."
Kozlowski added that with all of HNI's data pipelines automatically kept up to date, data users in the company can integrate live data sets with their preferred business intelligence tools such as Tableau and Power BI, or machine learning workflows inside Python and Spark. He said HNI is already using many of the new enterprise data governance capabilities extensively.
"We standardize all key data sets with secure data feeds, so that groups of data analysts can get access to the exact data set they need instead of raw data sets that can be difficult to decipher," he said. "We also use resource and cost reporting to help us understand which data sets and use cases are the most costly to inform us on where we should focus our optimization efforts."
Ascend Unified Data Engineering Platform is a rebranded offering
The Ascend Unified Data Engineering Platform itself is a rebranding of a prior offering from Ascend known as the Autonomous Dataflow Service, which was updated in Sept. 2019 with a Queryable Dataflows capability.
Sean Knapp, founder and CEO of Ascend.io, said that while some users grasped the concept of data flows, others had a challenge trying to figure out how to make that concept fit within their existing frameworks. The Unified Data Engineering Platform now restructures Ascend's Dataflow Service features into build, integration and deployment phases.
"What we figured out, and why we actually started to repackage the product in this way, is it maps really cleanly to what a data engineer does in their day-to-day, and so far, it's really helped people figure out how to leverage the various pieces," Knapp said.
Most of the new Ascend Govern capabilities are integrated as part of the core Unified Data Engineering Platform. One exception, however, is the data lineage feature, which Knapp called a premium offering that will incur additional cost.
Using data lineage to improve enterprise data governance
With data lineage info, it's possible for users to accurately track and determine where data came from and how it was acquired, which is a critical element of enterprise data governance.
Knapp said that tracking for data lineage begins at the data ingestion phase, where users define data connectors to different sources such as APIs, data warehouses or data lakes. Ascend builds profiles and digital fingerprints of the data so when changes occur they are tracked.
Sean KnappFounder and CEO, Ascend.io
"We know definitively for every piece of data that sits in the system where it came from, what code ran on it, why it ran on it, and then where it went and how it was used," Knapp said.
The economics of enterprise data governance
With visibility into where data comes from and an understanding of how it's used, it's also possible to get insight into costs.
Knapp said Ascend's platform exposes metrics around how many resources were consumed for a given data pipeline and a data cluster, to run a particular analysis operation that informs Ascend Govern's resource and cost reporting capability. Going a step further, with an understanding of costs and resource utilization, Ascend will also automatically optimize a data pipeline to reduce cost.
"We do a lot of really heavily automated cost optimizations," Knapp said. "We find it saves our customers more money and usually makes our system run faster, too."