- Fotolia

Making connections: Big data algorithms walk a thin line

Big data algorithms are at the heart of some travel snafus of late. Increasingly, data analytics teams need to look deeper into how algorithms work and the data on which they feed.

Spinal Tap had it right. As bassist Derek Albion Smalls and guitarist David St. Hubbins concluded in that send-up of the rockumentary, "There's a thin line between clever and stupid." We find that is true in data analytics more often these days.

Yes, the difference between a clever algorithm and a stupid one can be very thin. As data analytics becomes big data analytics, batch analytics become real-time analytics and algorithms become big data algorithms, there are reasons to think the line is getting even thinner.

This comes to mind in the wake of a summer replete with news of bungled algorithms and stranded air passengers. This can hit close. Don Fluckinger, our colleague at SearchCRM, witnessed this firsthand and described it ably in a recent piece on bungled seat assignments and a cancelled flight. The airline blamed it on an algorithm. Excuses like this are now as much a part of summer as sunscreen.

Air passengers have encountered the algorithmic daemon too often of late, as airlines push their luck with seating algorithms that boost revenue, but that underestimate the risk of bumping passengers.

This all started as a marvelous example of computer-driven logistics. Back in the day, planes often flew with dozens of empty seats. People could almost count on being able to spread out on those long, half-empty overnights. But computers got better and better at filling planes.

Learning from disaster

Today, seat assignment is an established science -- to the point where the airlines have begun employing assignment algorithms that push their luck.

When things become routine, there can be trouble lurking, especially if human judgement takes a back seat to the word of the machine, a notion that was reinforced in an interesting discussion of data science at last month's MIT Chief Data Officer and Information Quality Symposium.

The presentation by analytics veteran Sid Dalal centered on the use of big data, real-time computing and machine learning models, and the change these capabilities may bring upon insurance -- an industry traditionally focused on risk analysis.

Since 2013, Dalal has been chief data scientist and senior vice president at New York-based AIG. As he discussed the transformation underway in big data algorithms for analytics, he focused on the human side of the equation as much as the technology side. Human decision-making and discernment are required, Dalal noted. How people present analytical data and how they act upon analytics is crucial.

While at Bell Labs in the 1980s, Dalal said, he worked with a team that looked back on the 1986 Challenger space shuttle disaster to find out if the event could have been predicted. It is well-known that engineering teams held a tense teleconference the night before the launch to review data that measured risk. Ultimately, a go was ordered, even though Cape Canaveral, Fla., temperatures were much lower than in any previous shuttle flight.

Looking at fuller data sets

Dalal told MIT symposium attendees that the original analysis made on the eve of the Challenger flight was biased because data was missing. But it goes deeper than that, he said. Teams had gathered data that could well have led to a decision to scrub the launch, but it was not included in the preflight analysis.

The data had to do with O-rings, which helped seal stages of the shuttle's solid fuel booster rockets. These O-rings were known to have reliability issues. What wasn't understood was the correlation between O-ring problems and lower temperatures. In Dalal's words, "they were looking at only the obviously bad events."

Looking at a fuller data set, most anyone could see a correlation between O-ring damage and temperature. The decision-makers were presented a snapshot, however -- a flawed one.

Human judgement, along with data, is important.
Sid Dalalchief data scientist and senior vice president, AIG

But there were larger issues around the decision to launch the Challenger. One often cited was an endemic problem in the space shuttle program, one traced to the actual project title. Calling the vehicle a shuttle inferred that operations would be routine, as Dalal noted. Experience, including the later loss of the shuttle Columbia, eventually made a lie of the notion that things were routine.

Dalal offered the Challenger experience as a precautionary tale for data science. People need to understand the data they are dealing with and use good sense along with their analytical and machine learning models.

"Human judgement, along with data, is important," Dalal said. "A symbiosis between the human and the machine is actually critical."

Inscrutable big data algorithms

Today, discussions about algorithms often center on the algorithm in the black box, known as machine or deep learning. This is an algorithm that reaches a conclusion, but that offers no rationale for its selection.

In some industries -- and finance is certainly one of them -- there is distrust of the black box algorithm that does its magic, but does not explain how or why. An interesting take on the problem comes by way of Andrew Burt, with whom we recently spoke.

The path to understanding the machine learning model's conclusions begins with a deeper understanding of the data itself, according to Burt, chief privacy officer and legal engineer at Immuta, a company working to bring more data governance to advanced analytics endeavors.

Eventually, he said, organizations will come to govern machine learning by looking closely at the data that is fed in, the nature of the model itself and the decisions the model is making. In the meantime, they may have to walk a thin line.

Next Steps

Look back at the year of unpredictability

Look forward to the next machine age

Find out about unknown unknowns

Dig Deeper on Big data management