Imagine shouting at an animal and telling from the returning echo whether it’s a dog or a horse. A team of scientists has pulled off the photographic equivalent of that trick: They teased out a 3D image of a scene by timing the reflections of light onto a simple detector. Known as temporal imaging, the new technique demonstrates the power of a type of artificial intelligence called machine learning to unearth patterns in what appears to be mere noise.
“It’s really surprising to me that they can get anything out of this system because there’s absolutely not enough information coming out of it,” says Laura Waller, a computer scientist and electrical engineer at the University of California, Berkeley, who was not involved in the work. “That just shows the ability of machine learning to solve things that seem unsolvable.”
In conventional photography, ambient light reflects off an object, and a lens focuses it on to a screen of tiny light-sensing elements or pixels. The image is the pattern of brighter and dark spots the reflected light creates. A so-called time-of-flight camera can even add depth and make a 3D image by timing exactly when a flash of light reflected from an object arrives at the various pixels.
In recent decades, researchers have invented subtler ways to capture an image using just a single pixel detector. To do that, they expose the object not to uniform illumination, but to flashes of different patterns of light that vaguely resemble QR codes—those little square barcodes found on packaging. Each pattern will reflect off different parts of the object, so that the intensity of light measured by the pixel varies with the pattern. By tracking those variations, researchers can reconstruct the image of the object.
Now, data scientist Alex Turpin and physicist Daniele Faccio of the University of Glasgow and colleagues have invented a way to generate a 3D image with a single pixel—but without the patterned flashes. Taking advantage of a lightning-fast single-photon detector, they illuminated a scene with uniform flashes and simply measured the reflection times. With a precision of one-quarter of 1 nanosecond, the detector counts the number of photons arriving as a function of time. From that information alone, the researchers reconstruct an image of the scene.
That’s surprising, Waller says, because in principle there’s no one-to-one relationship between the arrangement of objects in the scene and the timing information. For example, photons reflecting off any surface 3 meters from the detector will reach it in 10 nanoseconds, no matter the direction to the surface. At first blush, such ambiguity would appear to make the problem unsolvable. “Single pixel imaging, when I first heard the concept, I thought, ‘That should work,’” Waller says. “This, I thought, ‘That shouldn’t work.’”
To get past that problem, Turpin and colleagues employed a machine learning program called a neural network that can be trained to detect subtle correlations between inputs and outputs. The researchers used their flashes of light and detector to record data of a person or two moving in front of a fixed, asymmetrical background scene—their lab. At the same time, they used a time-of-flight camera to record true 3D images of the scene.
After using the two data sets to train the neural network, the program was able to image people moving in the scene by itself, the researchers reported last week in Optica. Compared with those of the time-of-flight camera, the temporal images are blurry and lack detail. Yet, they clearly reveal the shapes of people.
The neural network can decipher the ambiguous signals because, thanks to its training, it will try to conjure up only scenes and objects similar to those it has already seen. But that means the system is also limited: It must train on the precise scene that it’s going to observe.
“We need the background,” Turpin says. “Without the background it would stop working.” Pointed at an entire new scene, he says, the system would likely produce a mistaken imagine much like the scene on which it trained, he says.
The temporal imaging system has some advantages over ordinary imaging, Turpin says. For example, the new system could be extremely fast, potentially working at 1000 frames per second. Such crude but rapid 3D imaging could have various applications, Turpin says. (Think, applying the brakes in an automated car.) The system is also cheap and simple. In theory, a technophile might be able to surveil a room using an ordinary laptop and the radio antenna from a wireless router, Turpin says.
Still, Waller says it’s not clear how useful the system will be, given that actual cameras are already fairly inexpensive. Instead, she says, the experiment raises an interesting conceptual question: Precisely how does the neural network learn to make reasonable images? “Why does it work?” she asks. “What’s the physics that it’s picking up on?” The challenge, Waller says, is to go beyond using the neural network as a black box and actually figure out what it’s doing.