maigi - Fotolia
Coming to San Diego directly out of the Code for America program in 2014, Maksim Pecherskiy was enthusiastic about forwarding the city's open data initiative to make data more transparent to citizens. The effort, in his view, would help make the city's government more efficient, while bringing the power of data to bear on civic problems.
He maintained that enthusiasm, even as he began his first burdensome task as chief data officer for the City of San Diego. That was to take an inventory -- to create a data catalog of the many data assets of the city's many departments. Doing so, and using the underlying data catalog software, was an education process for all concerned, he said.
But as the process unfolded, Pecherskiy found a welcome side effect to the opening up of data for San Diego residents -- the data catalog proved to be a big step toward opening up data to San Diego city workers, as well.
The cataloging project was a big undertaking. "Initially, it was just me. It was a manual process," he said. "Eventually, I worked with about 65 coordinators to create the data catalog."
The data sets documented in the data catalog were diverse: relational data from multiple versions of Oracle and SQL Server databases; geospatial data; Excel spreadsheet data; information from internet of things devices, such as smart parking meters; and more.
"We had to educate people in terms of what we were looking for when we said data," he said. "The result was the first holistic look at the data we had."
Streets of San Diego
An early part of the open data initiative was StreetsSD, which is a website that lets residents interactively view street conditions and repair plans. The application opened up new views of data for city departments too, and a data catalog helped in that mission.
During the course of creating his data catalog, Pecherskiy discovered an issue common to data integration projects -- different departments had different definitions for similar data elements. An example was a mile.
"For the streets maintenance department, a mile was a paved mile, but for other departments, it was more like the mile as most of us know it," he said.
To sort through such vocabulary issues, and to format data from diverse silos for wider consumption, Pecherskiy used Alation Inc.'s data catalog software. It enabled his team to create metadata to explain the lineage of data points, as well as to trace queries launched against data sets. In that sense, it acts as an interactive data catalog.
Data catalogs: The new black
According to "Data Catalogs Are the New Black in Data Management and Analytics," a recent report from analyst group Gartner, such data cataloging technology is finding wider use as the need to document and curate inventories of distributed data grows. Gartner counts Alation, along with Attivio, Collibra, Cambridge Semantics, Informatica and others, as providers of data catalog software.
Use of the Alation software goes beyond simple cataloging, as described by Pecherskiy. "The Alation software helps you look at query logs," he said. "It also provides a web interface that supports self-service for users."
Maksim Pecherskiychief data officer, City of San Diego
Having such an "abstraction layer across the diverse systems" is an essential element in the City of San Diego's open data initiative, he continued. "We couldn't have done any of the things we needed to do if we didn't understand the data."
Additionally, a useful part of the Alation suite, said Pecherskiy, is an R language package that helps automate steps in data handling processes for reporting.
Another tool employed as part of the overall program, he said, was data science task scheduling software known as Airflow. This was ceded to the open source community by online accommodations broker Airbnb, which originally developed it; the software serves to help manage the data workflow, according to Pecherskiy.
Data catalog discovery
The process of opening up data has, in turn, opened up a window into all kinds of city data. With a better view into the city's data collections, planners can make more informed decisions, Pecherskiy said.
Information on street paving plans, for example, can reduce a common source of citizen outrage -- the recently paved street that is subsequently dug up to lay a new water pipe.
Open data has also helped planners optimize routes used by city trucks moving supplies to reduce traffic congestion at peak times of day, and to cut order backlog, according to Pecherskiy.
Such examples show how open data initiatives for the public can become catalysts for better use of data internally, he said. This can help create a new mindset around data.
"Now, people are turning more toward thinking of data as an asset and problems as things that can be solved by using data creatively," Pecherskiy said.