- Fotolia

McKesson looks to simplify big data ecosystem for healthcare analytics

The big data ecosystem has many twists and turns. A McKesson data manager saw Splice Machine's database as a means to straighten the path by putting analytics and operations data in one place.

Data management professionals today face sea change upon sea change in the way data is done. They are gathering ever-larger and more varied amounts of data, trying out machine learning and navigating a new big data ecosystem of data management tools -- all at once.

Still, essential tenets hold true, despite the rolling seas. Healthcare analytics is a case in point. Analyzing fast-arriving data can lead to useful insights, but handling that data is the first step, according to a practitioner in healthcare analytics.

"The important thing is to get the data in a format in which the machine can work on it," said Manuel Salgado, senior data and analytics manager at healthcare giant McKesson Corp.

Salgado holds that an important first step in working with today's data is to simplify data management. That can be hard, given the surfeit of new tools available in a big data ecosystem, including frameworks such as Hadoop, HBase, Spark and many, many more.

Eliminating data silos

To reduce complexity while building a data pipeline for analytics, Salgado and McKesson opted for a hybrid database from Splice Machine for some projects. Splice Machine is called a hybrid database because it supports both transactional and advanced analytics jobs. It is ready-built with connections to different big data ecosystem elements.

"We realized the ecosystem for big data is not as mature as traditional data management," Salgado said. "We were dealing with a lot of components, and we looked for a way to make it easier."

The objective in using the hybrid approach was to eliminate data silos, reduce data movement and cut down on the number of moving parts, according to Salgado.

The important thing is to get the data in a format in which the machine can work on it.
Manuel SalgadoMcKesson

Splice Machine, in effect, does a lot of the necessary integration for customers, as its architecture directly connects a SQL relational database to an HBase NoSQL database for transaction processing, as well as Spark for analytics, distributing work across multiple Hadoop clusters. Along the way, it handles both analytics and operational data functions, and provides a single management console.

"In relational databases, such as Oracle and SQL Server, the database takes care of the details of the data management tasks. But that is difficult with Hadoop running by itself. It is just a file system," Salgado said. "At the end of the day, you have to manage those files."

He said he wanted to ensure that analysts and developers were not spending too much time managing the complexity of highly scaled data processing, and that Splice Machine helped in this regard. Salgado said the approach helped simplify data management, while reducing data movement.

"We are able to get the data into Splice Machine and do modeling and machine learning there," he said. "We can call up Spark or [Google] TensorFlow machine learning libraries and not have to move data around."

The result is that analytics and modeling occur in the same place, "as opposed to a lot of data round trips," Salgado said.

Driving business goals

Salgado's approach to today's big data ecosystem is tempered by experience. He's been involved with data management for many years. Most recently, those data management skills have been turned toward predictive and prescriptive analytics, as well as machine learning.

"In healthcare, we are essentially trying to find out how we can make physicians' decisions more efficient," Salgado said. "We have several projects underway that are trying to harness data we have in the system and drive models from it."

At the moment, he adjudged, big data is very much a moving target. Different elements in the ecosystem show different levels of maturity.

"We realized the ecosystem for big data is not as mature as traditional data management," Salgado said. "We were dealing with a lot of components, and we looked for a way to make it easier."

It is not that his team is unenthusiastic about cutting-edge technology, he noted, but the issue is that the business goals have to drive the adoption of that technology.

"As much as we love to get into the weeds, we have to abstract up a bit from the technology in order to deliver to business," Salgado said.

Industry observer Mike Matchett said data management is likely to see more such applications that mix operational and analytical processes. The hybrid approach can also help bridge the gap between existing data and new machine learning workloads.

With software like Splice Machine, said Matchett, senior analyst and consultant at Taneja Group, users can take legacy applications and add machine learning without doing a big rewrite. Such support will become more crucial as organizations try to apply more analytics acumen to more operational data.

Next Steps

Get an expert angle on big data ecosystems

Find out how big data feeds machine learning

Learn about Google's strategy for big data

Dig Deeper on Data warehouse project management