Manage Learn to apply best practices and optimize your operations.

Deft preprocessing marks deep learning techniques for data preparation

Listen to this podcast

Deep learning techniques for data preparation include exploration of data sets and algorithms. This calls for more than a bit of art on the part of data engineers and scientists.

For new classes of deep learning applications, data preparation is something of a moving target. In fact, sophisticated data preprocessing is part of the preparation in some instances. These and related issues were discussed in a reporters' podcast review of the recent Deep Learning Summit 2017 in Boston.

Some deep learning techniques for data preparation are familiar, while some are new. There is noise -- that is, false or misleading data -- in most any raw data set, whether it is part of BI or more advanced deep learning efforts.

In deep learning today, there is exploration and experimentation on the front end. That work is pursued by data engineers, data scientists and others. What is new here is that deep learning applications can require significant preprocessing to sort the signal from the noise.

As with traditional BI, deep learning data can require data transformation to ensure some kind of unified view. That way, data can be effectively compared. But deep learning techniques also include preprocessing that enables team members to explore data sets and algorithms.

As part of the preparation process, data scientists may choose to work on sample subsets of the overall data quantity. Sampling and removal of data outliers can be a matter of controversy when it comes to deep learning, where some practitioners believe the more data, the better.

Team members may also need to decide if their problem is amenable to stand-alone, rather than distributed, processing, according to the participants in this edition of the Talking Data podcast.

One key takeaway from the deep learning conference: Although they have garnered a lot of attention for some time, the techniques for distributed deep learning applications are still new. So, the algorithms that underlie distributed deep learning can be experimental in nature. Single-machine deep learning can be a better choice, depending on the amount of memory and processing required for the job.

Next Steps

How machine learning and deep learning techniques stack up

Learn about infrastructure for deep learning applications

Review the steps for putting deep learning into production

Deep learning changes the data infrastructure status quo