Last month, as much of the United States shivered in Arctic cold, weather models predicted a seemingly implausible surge of balmy, springlike warmth. A week later, that unlikely forecast came true—testimony to the remarkable march of such models. Since the 1980s, they’ve added a new day of predictive power with each new decade. Today, the best forecasts run out to 10 days with real skill, leading meteorologists to wonder just how much further they can push useful forecasts.
A new study suggests a humbling answer: another 4 or 5 days. In the regions of the world where most people live, the midlatitudes, “2 weeks is about right. It’s as close to be the ultimate limit as we can demonstrate,” says Fuqing Zhang, a meteorologist at Pennsylvania State University in State College who led the work, accepted for publication in the Journal of the Atmospheric Sciences.
Forecasters must contend with the atmosphere’s turbulent flows, which nest and build on each other as they create clouds, power storms, and push forward cold fronts. A tiny disruption in one layer of turbulence can quickly snowball, infecting the next with its error. A 1969 paper by Massachusetts Institute of Technology mathematician and meteorologist Edward Lorenz introduced this dynamic, later dubbed the “butterfly effect.” His research showed that two nearly identical atmospheric models can diverge widely after 2 weeks because of an initial disturbance as minute as a butterfly flapping its wings.
“That was a revolutionary insight,” says Richard Rotunno, a meteorologist at the National Center for Atmospheric Research (NCAR) in Boulder, Colorado, who was not involved in the new study. If real, this 2-week descent into chaos would set a fundamental limit to the atmosphere’s predictability.
Lorenz’s idea has been validated in theory. But until recently, global weather prediction models lacked the high resolution needed to test it by recreating the storm-forming processes driving the atmosphere’s chaos. Zhang hoped that the next generation of supercomputer-powered weather models, including those run by the European Centre for Medium-Range Weather Forecasts and the U.S. National Weather Service (NWS), would provide a credible test. Along with colleagues, he convinced the weather agencies to let them chew up expensive computing cycles running identical versions of several real-life weather events.
Typically, weather models are fed observations from satellites, balloons, and other outposts, generating what are known as initial conditions. These renderings are far from perfect, and it’s difficult to know whether a model’s growing unreliability as it runs is due to its mismatch with reality or atmospheric chaos. Improving how these observations are sucked into computer models has played a big part in improving forecasts, and it has helped the European model outdo its competitors.
The European model, like most of its peers, accounts for the remaining uncertainties in its initial conditions by running multiple versions of an event side by side, each with a slightly tweaked start, to come up with a consensus forecast. In Zhang’s experiments, he reduced this variation tenfold, essentially pretending that the model had a near-perfect view of the weather. He and his colleagues then ran the European model 120 times, with each run simulating 20 days, to recreate two large-scale weather events: a December 2015 cold snap in Northern Europe and June 2016 downpours in China. They also ran the cold snap using the next version of the U.S. Global Forecast System, which—barring another government shutdown—should deploy to forecasters next month.
On both models, the renditions steadily diverged until—at the 2-week mark—they appeared wholly unrelated. In effect, the models’ forecasting skill fell to zero at that point. “It’s a very credible result,” says Eugenia Kalnay, a meteorologist at the University of Maryland in College Park who previously led the NWS’s modeling arm. Some researchers doubted Lorenz’s model, given that it lacked some important atmospheric features, she says, but this shows the underlying idea is sound. “It’s nice because it’s simple.”
Two weeks may not be the absolute limit, Rotunno says. A similar exercise that ran last year on NCAR’s next-generation model found that the models started diverging between 2 weeks and 3 weeks. However, that model is not as battle-tested as the European gold standard, and the study could afford few runs, limiting its sample size. “At a practical level, they’re not going to issue those 3-week forecasts,” Rotunno says.
Still, Zhang adds, it’s heartening to know that there’s room to improve on the gains of the last few decades. He saw those benefits firsthand last month when his airline suggested he rebook a flight to London 5 days in advance due to a potential snowstorm. He heeded the forecasters’ advice and had an enjoyable extra day in London. His original flight? Canceled.
*Correction, 19 February, 12:55 p.m.: An earlier version of this story misstated the timing of the simulated cold snap and implied the U.S. model ran both weather events, rather than only one.