Is Artificial Intelligence Permanently Inscrutable?

Dmitry Malioutov can’t say much about what he built.

As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a large insurance corporation. It was a challenging assignment, requiring a sophisticated algorithm. When it came time to describe the results to his client, though, there was a wrinkle. “We couldn’t explain the model to them because they didn’t have the training in machine learning.”

In fact, it may not have helped even if they were machine learning experts. That’s because the model was an artificial neural network, a program that takes in a given type of data—in this case, the insurance company’s customer records—and finds patterns in them. These networks have been in practical use for over half a century, but lately they’ve seen a resurgence, powering breakthroughs in everything from speech recognition and language translation to Go-playing robots and self-driving cars.

HIDDEN MEANINGS: In neural networks, data is passed from layer to layer, undergoing simple transformations at each step. Between the input and output layers are hidden layers, groups of nodes and connections that often bear no human-interpretable patterns or obvious connections to either input or output. “Deep” networks are those with many hidden layers.

Michael Nielsen /

As exciting as their performance gains have been, though, there’s a troubling fact about modern neural networks: Nobody knows quite how they work. And that means no one can predict when they might fail.

Take, for example, an episode recently reported by machine learning researcher Rich Caruana and his colleagues. They described the experiences of a team at the University of Pittsburgh Medical Center who were using machine learning to predict whether pneumonia patients might develop severe complications. The goal was to send patients at low risk for complications to outpatient treatment, preserving hospital beds and the attention of medical staff. The team tried several different methods, including various kinds of neural networks, as well as software-generated decision trees that produced clear, human-readable rules.

The neural networks were right more often than any of the other methods. But when the researchers and doctors took a look at the human-readable rules, they noticed something disturbing: One of the rules instructed doctors to send home pneumonia patients who already had asthma, despite the fact that asthma sufferers are known to be extremely vulnerable to complications.

The model did what it was told to do: Discover a true pattern in the data. The poor advice it produced was the result of a quirk in that data. It was hospital policy to send asthma sufferers with pneumonia to intensive care, and this policy worked so well that asthma sufferers almost never developed severe complications. Without the extra care that had shaped the hospital’s patient records, outcomes could have been dramatically different.

Ken Goldberg is nothing if not creative. A distinguished roboticist and researcher at the Automation Sciences Lab at the University of California, Berkeley, he’s also an internationally recognized artist. He’s the author of more than 150 peer-reviewed papers on algorithms...READ MORE

The hospital anecdote makes clear the practical value of interpretability. “If the rule-based system had learned that asthma lowers risk, certainly the neural nets had learned it, too,” wrote Caruana and colleagues—but the neural net wasn’t human-interpretable, and its bizarre conclusions about asthma patients might have been difficult to diagnose.1 If there hadn’t been an interpretable model, Malioutov cautions, “you could accidentally kill people.”

This is why so many are reluctant to gamble on the mysteries of neural networks. When Malioutov presented his accurate but inscrutable neural network model to his own corporate client, he also offered them an alternative, rule-based model whose workings he could communicate in simple terms. This second, interpretable, model was less accurate than the first, but the client decided to use it anyway—despite being a mathematically sophisticated insurance company for which every percentage point of accuracy mattered.

“They could relate to [it] more,” Malioutov says. “They really value intuition highly.”

Even governments are starting to show concern about the increasing influence of inscrutable neural-network oracles. The European Union recently proposed to establish a “right to explanation,” which allows citizens to demand transparency for algorithmic decisions.2 The legislation may be difficult to implement, however, because the legislators didn’t specify exactly what “transparency” means. It’s unclear whether this omission stemmed from ignorance of the problem, or an appreciation of its complexity.

Some researchers hope to eliminate the need to choose—to let us have our many-layered cake, and understand it, too.

In fact, some believe that such a definition might be impossible. At the moment, though we can know everything there is to know about what neural networks are doing—they are, after all, just computer programs—we can discern very little about how or why they are doing it. The networks are made up of many, sometimes millions, of individual units, called neurons. Each neuron converts many numerical inputs into a single numerical output, which is then passed on to one or more other neurons. As in brains, these neurons are divided into “layers,” groups of cells that take input from the layer below and send their output to the layer above.

Neural networks are trained by feeding in data, then adjusting the connections between layers until the network’s calculated output matches the known output (which usually consists of categories) as closely as possible. The incredible results of the past few years are thanks to a series of new techniques that make it possible to quickly train deep networks, with many layers between the first input and the final output. One popular deep network called AlexNet is used to categorize photographs—labeling them according to such fine distinctions as whether they contain a Shih Tzu or a Pomeranian. It consists of over 60 million “weights,” each of which tell each neuron how much attention to pay to each of its inputs. “In order to say you have some understanding of the network,” says Jason Yosinski, a computer scientist affiliated with Cornell University and Geometric Intelligence, “you’d have to have some understanding of these 60 million numbers.”

Even if it were possible to impose this kind of interpretability, it may not always be desirable. The requirement for interpretability can be seen as another set of constraints, preventing a model from a “pure” solution that pays attention only to the input and output data it is given, and potentially reducing accuracy. At a DARPA conference early this year, program manager David Gunning summarized the trade-off in a chart that shows deep networks as the least understandable of modern methods. At the other end of the spectrum are decision trees, rule-based systems that tend to prize explanation over efficacy.

WHAT VS. WHY: Modern learning algorithms show a tradeoff between human interpretability, or explainability, and their accuracy. Deep learning is both the most accurate and the least interpretable.


The result is that modern machine learning offers a choice among oracles: Would we like to know what will happen with high accuracy, or why something will happen, at the expense of accuracy? The “why” helps us strategize, adapt, and know when our model is about to break. The “what” helps us act appropriately in the immediate future.

It can be a difficult choice to make. But some researchers hope to eliminate the need to choose—to allow us to have our many-layered cake, and understand it, too. Surprisingly, some of the most promising avenues of research treat neural networks as experimental objects—after the fashion of the biological science that inspired them to begin with—rather than analytical, purely mathematical objects. Yosinski, for example, says he is trying to understand deep networks “in the way we understand animals, or maybe even humans.” He and other computer scientists are importing techniques from biological research that peer inside networks after the fashion of neuroscientists peering into brains: probing individual components, cataloguing how their internals respond to small changes in inputs, and even removing pieces to see how others compensate.

Having built a new intelligence from scratch, scientists are now taking it apart, applying to these virtual organisms the digital equivalents of a microscope and scalpel.

Yosinski sits at a computer terminal, talking into a webcam. The data from the webcam is fed into a deep neural net, while the net itself is being analyzed, in real time, using a software toolkit Yosinski and his colleagues developed called the Deep Visualization toolkit. Clicking through several screens, Yosinski zooms in on one neuron in the network. “This neuron seems to respond to faces,” he says in a video record of the interaction.3 Human brains are also known to have such neurons, many of them clustered in a region of the brain called the fusiform face area. This region, discovered over the course of multiple studies beginning in 1992,4, 5 has become one of the most reliable observations in human neuroscience. But where those studies required advanced techniques like positron emission tomography, Yosinski can peer at his artificial neurons through code alone.

BRAIN ACTIVITY: A single neuron in a deep neural net (highlighted by a green box) responds to Yosinski’s face, just as a distinct part of the human brain reliably responds to faces (highlighted in yellow).

Left: Jason Yosinski, et al. Understanding Neural Networks Through Deep Visualization. Deep Learning Workshop, International Conference on Machine Learning (ICML) (2015). Right: Maximilian Riesenhuber, Georgetown University