Illustration by Hailey Lee

How a Google Researcher Is Making AI Easier to Understand

John Pavlus
Magenta
5 min readJan 10, 2019

--

It’s known as the “black box problem”: Because deep-learning models are trained, rather than programmed, there’s no way to pop the hood to see how they work. Been Kim has designed a human-centered solution to the problem.

One of Google’s most important designers doesn’t have the word “design” anywhere in her job description. Officially speaking, Been Kim is a senior research scientist at Google AI working on machine learning. But when she describes her work in her own words, the d-word shows up right away: She is “interested in designing high-performance machine learning methods that make sense to humans.” On her Github homepage, where that quote appears, Kim has emphasized the words “to humans” in boldface type.

That emphasis matters. To nonexperts like you and me, “high-performance machine learning methods” means AI. And one of the fascinating and/or spooky things about AI systems is that their inner workings often don’t make sense to humans — not even the experts.

Been Kim

For example, an AI system using deep-learning neural networks (the technology that delivers the aforementioned high performance) may be trained to identify images of zebras with near-perfect accuracy. But the very fact that the system was trained, rather than traditionally programmed, makes it impossible to “pop the hood” and identify the exact mechanism that accomplishes the zebra-identifying.

This peculiarity — that a machine can make reliable, accurate decisions without its own makers knowing how it does so — has become known in AI circles as “the black box problem.” It has even spawned its own research subfield, aspirationally labeled “explainable,” “transparent,” or “interpretable” AI. But unlike many of the scientists, programmers, engineers, and academics working to crack the black box, Kim says her work is driven by “a core perspective of human-centered design.”

“Some would suggest that we have to write down what ‘interpretability’ means in math before we even address this problem,” she continues. “That’s the way that we solve problems as engineers. The idea of thinking about interpretability the other way around — thinking about the human first before designing the system — is a very foreign concept.”

Users of AI (including technical experts) don’t need glass boxes, because there’s nothing useful to “see”; instead, they need something more interactive — a way of “asking” the system how it “thinks,” and getting meaningful answers.

Kim’s human-first thinking begins with the very meaning of “interpretability” itself. Some researchers (including those funded by DARPA’s XAI initiative) imagine that a solution to the opacity of AI will depend on transparency — by turning the black box into a glass box. Kim isn’t so sure. “If you put up 20,000 lines of code, OK, yes, that’s transparent,” she says. “But it’s not something that is easy for anybody to gain meaningful insights from.” Kim’s vision for interpretable machine learning is informed by more pragmatic values: usability, feedback, verification, safety. Users of AI (including technical experts) don’t need glass boxes, because there’s nothing useful to “see”; instead, they need something more interactive — a way of “asking” the system how it “thinks,” and getting meaningful answers.

Such an interaction sounds “human-centric” to the point of absurdity. (“Say, AI, how do you recognize those zebras?”) But Kim has actually built a working version of it called TCAV, which stands for “testing with concept activation vectors.” Without going into the complicated science and mathematics behind it, TCAV lets users test the relevance of specific, high-level concepts to a machine-learning model’s decision-making process.

To return to the zebra-identifer example: Using TCAV, a human could define a natural-language concept like “stripes” in terms of several visual examples, feed those examples into the zebra-identifying AI, and then receive a “TCAV score” between 0 and 1. A higher score means that the concept — “stripes,” in this example — is highly relevant to the AI system’s process for identifying a zebra. (An irrelevant concept, like “blue,” would likely receive a TCAV score close to zero.)

Not only can TCAV help a human being “know what the AI is thinking” in concrete terms, it also lets experts do sanity checks on machine learning models without having to untangle every connection manually in their deep neural networks. If, say, the zebra-identifying AI appears to be working well but returns a high TCAV score for an irrelevant concept like “blue,” then it’s clear that something is still amiss within the black box. Kim describes TCAV as a “translator for humans,” but in a way, it also works like the Voight-Kampff machine from Blade Runner: as a means of exposing a machine trying to pass for something it’s not.

“Creating trust [in an AI system] is not our goal,” Kim asserts. “That [trust] might come with interpretability, but that’s not our goal. Our goal is to provide insight into how machine learning models are actually making decisions.”

Kim thinks human-centered interpretability methods like TCAV could help machine learning systems achieve a new level of mainstream adoption in the same way that graphical user interfaces (GUIs) helped democratize PCs. “It’s a good analogy,” she says, “but there’s another layer to it. A GUI makes the computer ‘interpretable’ to users, but as an engineer, you can still investigate exactly what each line of code is doing. With machine learning models, you can’t even do that.” In other words, TCAV (or a tool like it) might be the interactive “GUI” that black-box AI needs — but it won’t just be designed for end-user noobs. The people creating and training the AI systems will need it, too.

Magenta is a publication of Huge.

--

--

I write & make films about science, tech, design, math, and other ways that people make things make sense. johnpavlus.com / pavlusoffice.com / mindfun.biz