DeepDream: How Alexander Mordvintsev Excavated the Computer’s Hidden Layers
Early in the morning on May 18, 2015, Alexander Mordvintsev made an amazing discovery. He had been having trouble sleeping. Just after midnight, he awoke with a start. He was sure he’d heard a noise in the Zurich apartment where he lived with his wife and child. Afraid that he hadn’t locked the door to the terrace, he ran out of the bedroom to check if there was an intruder. All was fine; the terrace door was locked, and there was no intruder. But as he was standing in his living room, suddenly he “was surrounded by dozens of very beautiful ideas,” he tells me. “That beautiful moment occurred when one idea crystallizes to a point where you can start programming.” It all came together. In an instant he saw what everyone else had missed. He sat down straight away at his computer and began to type lines of code.
Up until then, artificial neural networks — designed to mimic the brain and recognize patterns — had been our servants, dutifully performing the tasks we asked them to perform, becoming steadily better at serving us. Mordvintsev’s adventure that night was to transform completely our conception of what computers were capable of. His great idea was to let them off the leash, see what happened when they were given a little freedom, allowed to dream a little. He let loose the computer’s inner workings, and tapped into their mysterious hidden layers. Who would have guessed that they would throw up wild images not a million miles from van Gogh’s “Starry Night”?
Mordvintsev comes across as focused, intense. He’s more relaxed when he recalls his childhood, growing up in St. Petersburg. Mordvintsev graduated in 2010 from Saint Petersburg State University of Information Technologies, Mechanics, and Optics with a master’s in computer science, then went to work for a company specializing in marine-training simulators, using computers to model biological systems — in this case, coral reefs. Assigned to a computer vision group, he quickly became fascinated with the field. Computer vision is all about reinventing the eye, teaching computers to see, developing computers that can understand digital and audio images — that is, pictures and sound. This appealed to Mordvintsev greatly.
When Mordvintsev’s first child was born, he and his wife decided that big cities were not the place for children. It was then that he received a call from a Google recruiter in Zurich, offering him a job there.
On arrival, he was worried. There were not many teams that did computer vision, and he was “a computer vision guy.” Worse, he was assigned to a team specializing in SafeSearch, preventing spam and porn from infecting search results. Nevertheless, he had the chance to wander around Google and mingle. Chatting with his fellow workers, Mordvintsev was struck by what he was learning about the power of deep neural networks. While he was once skeptical, he was now in a place with access to huge data caches and the most up-to-date machines. As he put it, he quickly realized how deep neural networks could “really shine.”
Our brains are made up of at least 100 billion interconnected nerve cells (neurons), linked by a barely understood thicket of 100 trillion connections. For us to see, the neurons in our brains pick up the images we receive on our retinas and give them shape and meaning. The process of creating an image out of a jumble of visual impressions begins in the primary visual cortex, which identifies lines and edges. This basic sketch is passed — like an assembly line in a factory — to the regions of the brain, which fill in the shape, spot shadows, and build up noses, eyes, and faces. The final image is put together using information from our memories and from language, which helps us categorize and classify images into, for example, different types of dogs and cats.
Artificial neural networks are designed to replicate the activities of the brain and to cast light on how the brain works. Convolutional neural networks (ConvNets), which captured Mordvintsev’s attention at Google, are a specialized form devoted mainly to vision, able to recognize objects and spot patterns in data. The neurons are arranged in a similar way as in the eye. ConvNets have up to 30 layers, each made up of many thousands of artificial neurons, which is why they are called deep neural networks. The neurons in each layer are able to detect signals but are much less complex than the brain’s nerve cells. The number of neurons and connections in a ConvNet is tiny in comparison to the number of nerve cells and connections in the human brain. For now, artificial neural networks are more akin to the brain of a mouse. But the assembly line process by which the machine recognizes an image is similar to the way in which we see.
To train a ConvNet, you feed in millions of images from a database such as ImageNet, which is made up of over 14 million URLs of images, annotated by hand to specify the content. The network’s adjustable parameters — the connections between the neurons — are tuned until it has learned the classifications so that when it is shown an image of, for example, a particular breed of dog, it recognizes it.
When you then show it an image and ask it to identify it, the ConvNet works in a similar way to the human brain. First, the early layers pick out lines in the map of the pixels that make up the image. Each layer in succession then picks out more and more details, building up parts of faces, cars, pagodas, whatever images it has inside its memory. The deeper you go, the more abstract the information gets. Finally, in the last layer, all the results of analyzing features in the pixels are assembled into the final image, be it a face, a car, a dog, or any of the millions of images that the neural net was trained on.
The crucial point is that the machine does not see a cat or dog, as we do, but a set of numbers. The image is broken up into pixels. Each pixel is represented by numbers that give its color on a red, green, and blue scale and its position. In other words, it’s numbers all the way down. In the first layer, a filter illuminates areas of the pixel map one at a time, seeking out lines and edges, convolving — hence the term convolutional neural network. Then it transmits this primitive sketch to the next layer. The filters operate in the same way in each layer to clarify and identify the target image.
Finally, the last layer comes up with probabilities of what the image actually is. If the network has been asked to identify a dog, the conclusion might be 99.99 percent probability that it’s a dog, with as many low probabilities for it being a cat, a lion, or a car as there are classes in the data it’s been trained on. This is what makes a Google reverse image search possible: Google trains a ConvNet on your image and returns its best guess.
Before deep neural networks, the filters in each of the layers had to be hand-engineered, a formidable task. In ConvNets, they emerge as a natural consequence of training.
One of the first successes of artificial neural networks was to read the numbers on checks. Now they can recognize faces, find patterns in data, and power driverless cars. Most scientists were satisfied to leave it at that. The unasked question was, What is the machine’s reasoning? What occurs in the layers of neurons between the input layer that receives the image to be identified and the output layer where the solution emerges? These are the hidden layers, so called because they are neither input nor output; they are inside the machine. Mordvintsev became obsessed with finding out not only why ConvNets worked so well but why they worked at all, how they reasoned, what goes on in the hidden layers.
He set to work even though this problem was not part of his official duties, which related to SafeSearch. Google has a policy of allowing its engineers to spend up to 20 percent of their time, or one day a week, on some other Google-related project. Researchers, of course, can’t just turn their minds on and off. The passion of your inquiry stays with you, either in your conscious or, more likely, unconscious mind.
A team from Oxford University had published papers that provided clues as to how best to proceed. They explained that when the computer was being fed with images, the pixels that made up the image were converted into numbers. To investigate how the ConvNet worked, the researchers stopped the process part way into the hidden layers and adjusted the connections between the neurons in that layer so that the machine saw an approximation of the target image. They were trying to find out what the network saw, what was going on inside its “brain.” They discovered that the images in the various layers, though blurry, still resembled the target image.
Following the same lines, Mordvintsev was sure the hidden layers were not just black boxes, taking in data and producing results. He saw them as “transparent but very, very obscure.” And that was when, not long after midnight on that early May morning, everything suddenly fell into place. Mordvintsev sat down and wrote the code that encapsulated his breakthrough, enabling him to explore how a neural network works layer by layer.
Instead of looking at what features of the original image are contained in a particular layer and then generating those features in the form of pixels to produce an approximate impression of the original image, as the Oxford team had done, Mordvintsev did the reverse. He fed an image into a ConvNet that had been trained on the ImageNet data, but stopped the forward progress part way through. In other words, he stepped on the brakes. When the network was still in the middle of trying to verify a nascent sense that a particular pattern might be a target object, he told it to generate it right then and there.
The intermediate layers in a network are made up of many thousands of interconnected neurons containing a bit of everything the network has been trained on — in this case, the ImageNet dataset with lots of dogs and cats. If the target image has, for example, even a hint of dog in it, the relevant artificial neuron in that layer will be stimulated to emphasize the dogness. Then you reverse the process repeatedly back and forth and see what emerges. As Mordvintsev puts it, “Whatever you see there, I want more of it.” Google engineers, according to Jessica Brillhart, the principal filmmaker for virtual reality at Google, refer to this as “brain surgery.” Whereas the Oxford group tried to reconstruct the original image, Mordvintsev’s great idea was to keep the strengths of the connections between the neurons fixed and let the image change. As he put it modestly: “There were many suggestions about understanding neural networks. Mine was very practical.”
So much for theory. But what happened in practice?
The first image that Mordvintsev used was of a cat and a beagle.
Normally, if you fed this through the machine, it would identify both the cat and the dog, having been trained on ImageNet with its images of 118 breeds of dog plus several cats. Mordvintsev fed in just the cat portion of the image and stopped part way through the hidden layers, bursting with neurons containing a mash-up of dog and cat features, as well as whatever else is in ImageNet. The result of passing this image through several times was the aptly named “nightmare beast.” Nothing like it had ever been seen — a thing with two sets of eyes on its head and another set on its haunches, and eyes and canine attributes bursting out all over its body: not entirely surprising, as the network had been trained more on dogs than cats. The background too had been transformed into complex geometric patterns, with a couple of spiders bursting through. It seems the machine saw spiders there, even though we hadn’t. It was a vision of the world through the eyes of the machine.
Mordvintsev sat up until 2:00 a.m. writing a report full of images like that of the cat. He applied his algorithm at multiple scales, producing big and small cat-like creatures simultaneously, producing images with fractal properties and a look that can only be called psychedelic. The crucial point is that the machine was producing images that were not programmed into it.
Our human perceptual system behaves in a similar way, making us “see” things that aren’t really there, like the face on the moon or pictures in clouds or the happy face in the Galle crater on Mars, an illusion called pareidolia. The dream-like images generated by Mordvintsev’s algorithm were almost hallucinogenic, startlingly akin to those experienced by people on LSD. Did that mean that an artificial neural net was not that artificial? Could we say that Mordvintsev had found a way to look into the machine’s unconscious, into its inner life, its dreams? Certainly he’d found a way to plumb its hidden layers.
Finally, Mordvintsev posted several of his images on an internal Google website, assuming that no one would notice them for a while. “I was wondering how to get back to sleep,” he recalls.
But the sun never sets on Google. It was late afternoon in Mountain View, California, and Google headquarters was in full swing. Mordvintsev’s images went viral. The unknown engineer from the innocuous SafeSearch team in Zurich received rave reviews.
Arthur I. Miller is Emeritus Professor of History and Philosophy of Science at University College London. He is the author of, among other books, “Colliding Worlds: How Cutting-Edge Science is Redefining Contemporary Art” (W.W. Norton) and “The Artist in the Machine,” from which this article is adapted.