Nmedia - Fotolia
Corporations busy accumulating big data need to implement a data supply chain to turn it into useful business information, according to Accenture's Vince Dell'Anno. As new architectures evolve, that calls for different tools at different stages, and different approaches in different industries.
For Dell'Anno -- who, not coincidentally, managed to coax "data supply chain" into his job title -- the supply chain analogy is key. Like commodities refined and delivered through a manufacturing supply chain, data also needs to be polished and brought to users.
"The challenge businesses have is how to access this data. This is forcing a dialog on many fronts within companies," said Dell'Anno, managing director of information management for the data supply chain at Accenture's analytics consulting group.
A data supply chain starts with the creation and ingestion of data . As the data is cleansed and distilled, as is done, for example, with raw oil products in the oil and gas supply chain, it gains in value. New data is often combined with other data. Ultimately it is packaged for an end user in a way that helps them make good business decisions.
The world is a staging area
The means for gathering data has changed in recent years to include new tool types. People, Dell'Anno said, are bringing data into the open source Hadoop data platform, using it as a staging area. "I'm seeing a willingness to explore these new technologies and see what they can do," he said.
In this area, Dell'Anno sees many companies innovating around new technologies and bringing together data that is more varied and more voluminous than in the past.
Companies are re-architecting entire data infrastructures at companies, he said, and it often starts with open source Hadoop parallel data processing. "I am seeing the emergence of Hadoop as part and parcel of the big data ecosystem. Some industries and some companies are further along than others," Dell'Anno said. "Almost all of them have deployed it in some way, even just as a way to lower the cost of [consuming] more data."
"They are either in production, or on their way," he said. "In fact, I don''t see Hadoop as new anymore."
What is new is how companies are trying to leverage Hadoop. That means more work will be directed at improving data handling in later stages of the data supply chain, ones that focus on delivery of usable analytics derived from Hadoop.
But, he added, moving the data through the latter stages of a data supply chain can prove to be a hurdle -- especially for companies that have tens of thousands of reports in the works at any given time.
Leading-edge firms are looking for cost-effective ways of combining output from tools like Hadoop with data visualization software like Tableau, said Dell'Anno. That approach could well mean running hybrid environments -- ones that mix existing data technology with newer innovations -- at scale. And, in many instances, it could mean running reports for tens of thousands of end users, he said.
"The job is to leverage all the data that you have at your disposal. There's new data being born every day, but that doesn't necessarily translate into information," said Dell'Anno. To do that, it is necessary to learn how today's different data processing tools work together, and how to deliver results.
Up the data supply chain, or when to munge
"What I see is people moving from proofs-of-concept to looking for value in an operational way," Dell'Anno said. Along the way, he added, volume of data can become an issue. Data professionals have to ask themselves whether they should sample the data, or work with the entire data set. That can become an increasingly difficult question as Hadoop applications accumulate more and more data.
To answer the question, data analysts have to look at the data. "There is no one-size-fits-all approach," he said. Certainly, an area like fraud detection can require a larger sample size than a recommendation engine on an e-commerce website.
Decisions on handling data vary for different data supply chains in different industries. In retail companies, for example, managers need to figure out what appropriate use of data comprises, according to Dell'Anno. That touches on what people are starting to call "the creepy factor" in big data, with Target Inc.'s 2012 prediction of a teen customer's pregnancy being a prime example.
"You have to understand your audience in order to know how to marry, for example, ERP data with social data, or how to munge ERP and non-ERP," he said.
Check out our DBMS architecture topics page
Listen to a podcast about data warehouse architecture