The growing fervor over tapping into the business value of big data has opened a new frontier for data stewardship efforts in organizations. And it's one that isn't likely to be as straightforward for data stewards as dealing with conventional structured transaction data.
That's because of the highly variable nature of collections of big data, which can include a mix of structured and unstructured data types -- transaction data, yes, but also system and network log files, information from sensors, Internet search records and text-based social networking data, to cite a few examples. Such data often comes from external systems, adding another complicating factor for data stewards, who can't exert any control over the quality and consistency of the information as it's being created.
As a result, a data stewardship framework for big data must focus on meeting quality and usability requirements on the usage side of the data governance and management process, after information has been pulled into internal systems, said David Loshin, president of consultancy Knowledge Integrity Inc.
Overall, the concept of big data governance and stewardship is still taking shape. Big data remains relatively new territory for many organizations, and data governance and stewardship processes often are immature. Because of that, few companies have attempted to combine the two in a formal way, according to William McKnight, president of McKnight Consulting Group in Plano, Texas. "Most big data programs in large enterprises are without stewardship and governance," he said. "They'll eventually come into the fold, but it's just not there yet."
Hands off my big data, data stewards!
One school of thought suggests that the nature of big data applications doesn't lend itself to heavy doses of governance and data stewardship in the first place. The idea behind big data analytics is to sift through mountains of data with highly sophisticated tools in hopes of unearthing a nugget or two that can provide strategic insights into business operations, customer preferences and the like. Data scientists might argue against any cleansing, tweaking or consolidation of the data because the clean-up efforts could skew the results of the advanced analytics applications they're looking to run.
More on data stewardship and big data
Get an overview of data stewardship with this definition
Find out why a data stewardship framework needs a solid plan
Learn why a data stewardship program is worth the initial difficulties
"Many in the data scientist community want to protect data in its pure form," said Shawn Rogers, an analyst at consulting and market research company Enterprise Management Associates Inc. in Boulder, Colo. "They'll argue that if you want to apply an algorithm or predictive process to the data, you shouldn't mess with the data. They say it isn't a place for data stewardship, but rather it's a place for discovery."
Another potential difference between a data stewardship program in the big data world versus the traditional data warehouse world is a tighter project scope. While companies are working to implement data stewardship principles at an enterprise level, big data analytics projects tend to be discrete and more targeted -- an initiative to support a voice of the customer strategy, for example. Such projects might last for a short time only, requiring a different approach by data stewards.
Taking a short-term data stewardship view
"In terms of the data, the projects are huge. But in terms of the effort, they can be tightly scoped and may not be repeated," said Jill Dyché, a longtime business intelligence and data management consultant who now heads a customer best-practices advisory team at analytics software vendor SAS Institute Inc. "The trick here would be to make sure the data fits for the purposes of the big data project, not necessarily to support the long-term business strategy."
In such cases, Dyché said, "a data steward might play more of a SWAT team role" -- for example, loading data into a Hadoop file system and quickly establishing a set of business rules. "It may require a more immediate need that's less planned and one that may not show up in the IT project portfolio again for another year."
With all the cross-industry urgency to get big data projects underway, the fanfare eventually should encourage broader adoption of data governance and stewardship processes in organizations, said Jonathan Geiger, an analyst at consultancy Intelligent Solutions Inc., also in Boulder. "Because the business imperative is paramount, data stewardship becomes more important," he said. "Ultimately, what you get down to is, data is data."
About the author:
Beth Stackpole is a freelance writer who has been covering the intersection of technology and business for more than 25 years.