Sergey Nivens - Fotolia
Count reporters among those looking for better ways to get data in front of their audience. In fact, the requirements of data journalism are sometimes quite like those of data analysis in the enterprise.
Both business analysts and reporters now push harder than ever to get useful data into stories -- data stories if they are information workers in companies and news stories if they are news writers working for news outlets. In either case, pulling useful data points out of dense spreadsheets and databases remains a difficult task.
A drive to gain insight from dense data sets is among the reasons that the Associated Press turned to a teamwork-oriented data collaboration platform from Austin, Texas-based startup data.world Inc. to deliver investigative data to its subscriber newsrooms in a more usable format.
Pulling telling data points from government and other reports is not new to the 171-year-old news organization. In recent years, the AP has done what has been called computer-assisted reporting, or data journalism, according to Ken Romano, AP's director of text and multimedia products. However, its methods for distributing that data were not kept up with its needs. They were, according to Romano, nascent.
AP wanted to do more than simply hand off databases to users, he said. It wanted to consistently deliver data to inform the stories writers produce. By using hosted news data services on the data.world platform, AP has been able to provide easier access to data for its customers and create queries on databases of national data sets that can be reused at the local level by newsrooms on deadlines, according to Romano.
In this way, he said, the queries a national AP news team create while, for example, studying data on Federal Emergency Management Agency flood insurance coverage in one hurricane area, could be reused in other locales. In a similar extension of the news team's data journalism efforts, localized versions of stories on pervasive demographic patterns in charter school enrollment could be spawned from the curated data sets compiled and distributed by AP.
"We are able to create ways for our users to query the data by, for example, just entering a zip code," Romano said.
The ability to add local, contextual facts to articles on deadline is important, he said, but equally important is the ability to do so quickly and easily.
Combining data from multiple sources and in multiple formats is part of the data service AP offers to its subscribers. AP teams will clean the data, normalize it where needed, check on the methodology behind gathering it and post relevant metadata for subscribers on data.world, Romano said.
He agrees with many others that cite such data preparation as the biggest obstacle to effective data science, but goes further to also cite well-packaged delivery of that data as a pressing need.
Among those behind the efforts of data.world are Jon Loyens and Matt Laessig; two of the company's co-founders who are, respectively, chief product officer and COO. The company's goal is to build socially collaborative tools for data workers of all kinds, according to Loyens.
Such data collaboration capabilities are part and parcel of financial leaders like Goldman Sachs or web powerhouses like Google, but they have not been available to smaller companies, Laessig said.
"There is so much promise to data, but so much is getting lost in silos," he said.
Launched in 2017, data.world provides an online portal that enables different participants in the data food chain to work on projects in a collaborative fashion -- to, in effect, break down the silos in order to surround the data with more useful context, Loyens said.
Under the covers, some cutting-edge semantic web style elements are used by data.world's creators to deliver such data context. For every data set that users create, data.world generates a graph database instance. That better enables flexible linking of data and metadata.
Ken Romanodirector of text and multimedia products, AP
But, while common tabular data is translated when it is uploaded into graph structures by data.world, those structures remain in the background for users, who simply see their data in familiar rows and columns.
These traits lurk under the facade of data.world. Its actual interface bears resemblance to other recent innovative technology portals.
The platform appears to be something like the online github.com software development service. As it was launched in some part as a demonstration of data collaboration for data practitioners of all sorts, data.world also bears some resemblance to data.gov, an Obama-era effort to open up U.S. government data.
Data news delivery
A by-product of AP's use of data.world has been increased data collaboration across different member newsrooms, Ken Romano said. The platform is being used by teams to share their experiences working with the data, much as the founders of data.world had hoped end users would.
"What we have seen is people collaborating with each other," Romano said. "Editors will jump in when they see new ideas. This has been a key."
In the internet age, the classic image of the reporter is changing. As old notions of newspaper ink stains fade, the collaborative data platform may bring new data journalism to prominence.