Petya Petrova - Fotolia

Data managers should study up on GPU deep learning

As GPU deep learning becomes more common, data managers will have to navigate several new layers of complexity in their quest to build or buy suitable data infrastructure.

Jack Vaughan

Published: 02 Feb 2018

AI-related deep learning and machine learning techniques have become a common area of discussion in big data circles. The trend is something for data managers to keep an eye on for a number of reasons, not the least of which is the new technologies' potential effect on modern data infrastructure.

Increasingly at the center of the discussion is the graphics processor unit (GPU). It has become an established figure on the AI landscape. GPU deep learning has been bubbling under the surface for some time, but the pace of development is quickening.

Deep learning is the branch of AI machine learning that works very recursively on many levels of neural networks comprising ultra-large data sets. GPU deep learning is a particularly potent combination of hardware infrastructure and advanced software aimed at use cases ranging from the recommendation engine to the autonomous car.

Today's GPUs have very high memory bandwidth that enables them to crunch big data with gusto -- think matrix multiplication. As a result, they have an affinity for the type of parallel processing that deep learning requires. This is particularly useful at the training stage of deep learning model creation.

GPUs are still relatively rare compared to CPUs. As a result, the GPU chips cost more, as do the developers that can work with them. Also, their use is expanding beyond AI deep learning, as they show up in graph databases, Apache Spark and, most recently, Apache Hadoop 3.0 Yarn implementations.

'Til there was GPU

Data managers that remember the world before all-purpose 32-bit CPUs may recall floating point coprocessors and array processors, as well as the special uses they served in some applications.

What should managers look out for when a new generation of developers tells them their applications need to move to a math-heavy GPU infrastructure? The answer is that there are a lot of moving parts involved.

Bernard Fraenkel

For one, things work one way when the job is limited to a handful of GPUs running on a single server. According to Bernard Fraenkel, there are gotchas to consider when deep learning jobs go beyond the single server. It's not just about the chip.

"When you reach the point that you need to use more than one server, then you probably don't have real guarantees that the bandwidth between the two machines will be acceptable," said Fraenkel, who is the practice manager at Silicon Valley Software Group, a technology consulting practice based in San Francisco. "It's hard to foresee the overhead of the inter-server communications."

Inter-server issues surface

Inter-server issues have led cloud providers, server houses and chipmakers to seek improvements at the board and server level, Fraenkel said. But with each improvement, a GPU deep learning implementation can become more closely tied to the system it has been running on, and it can become harder to successfully migrate to another system. That is a gotcha.

In addition, cloud providers and others are working to optimize software to run on their setups, and this too becomes an encumbrance to rehosting your deep learning application on premises or in other clouds. Also, migrations that require additional computation will take a bigger piece of your budget, of course.

What is important, Fraenkel emphasized, is to understand that things are changing very fast in this area. That means data managers should take special heed, even if GPU deep learning is not on their immediate agenda.

"We are still at an early stage of [the] application of artificial intelligence. Algorithms, as well as hardware -- such as chips, servers and data centers -- are still evolving rapidly," Fraenkel said.

We are still at an early stage of [the] application of artificial intelligence. Algorithms, as well as hardware -- such as chips, servers and data centers -- are still evolving rapidly

Bernard FraenkelSilicon Valley Software Group

Moreover, CIOs especially need to learn about all the layers involved in building these applications.

"They should be evaluating and becoming cogent of all these moving parts now," Fraenkel said. "It's not something that you pick up in one quarter."

There is precedence for this type of infrastructure upheaval, but there is also much about it that is new.

Advances in big data analytics influenced infrastructure changes in recent years -- columnar databases and distributed file systems come to mind straightaway -- but any changes at the chip level were usually slight.

Deep learning, particularly, seems to be a different animal -- one calling for, as Fraenkel observed, system changes at the chip level and above. Also, GPUs, machine learning and deep learning may be among several inflection points for big data analytics in the future, as a slew of new artificial intelligence chips are being prepared for special use cases.

For now, many managers will be watching deep learning-based GPU activity as spectators, not participants. But taking a somewhat avid interest may be a good measure, as GPU deep learning is moving very quickly and may have a significant impact.

Next Steps

Deep learning gives traditionally non-tech-focused industries a boost

Learn why talk of artificial intelligence in GPUs is all abuzz

Data managers should study up on GPU deep learning

As GPU deep learning becomes more common, data managers will have to navigate several new layers of complexity in their quest to build or buy suitable data infrastructure.

'Til there was GPU

Inter-server issues surface

Next Steps

Dig Deeper on Data management strategies

Compare GPUs vs. CPUs for AI workloads

graphics processing unit (GPU)

IBM takes the bits out of deep learning

What do GPUs do in your data center?

'Til there was GPU

Inter-server issues surface

Next Steps

Related Resources

Dig Deeper on Data management strategies

Compare GPUs vs. CPUs for AI workloads

graphics processing unit (GPU)

IBM takes the bits out of deep learning

What do GPUs do in your data center?