Feature: Can your mobile phone make you healthier?

John Holcroft

Feature: Can your mobile phone make you healthier?

Fitbit activity trackers come with a “manifesto.” Over a photo of a fierce-eyed jogger, the company’s website proclaims: “Every moment matters and every bit makes a big impact. Because fitness is the sum of your life.”

Roughly 20 million people have found that message compelling enough to order a Fitbit. Many more seek out other devices and smartphone apps designed to count their steps, their calories, or their hours of sleep; to help them quit smoking, drinking, or stressing; or to help manage chronic illness. The distillation of daily life into a motivational stream of stats has become a booming industry—the world of the quantified self.

This life-tracking craze has produced something that many clinical researchers covet: a deluge of intimate data about individuals’ moment-to-moment behavior in “the wild,” as researchers sometimes call the world outside the controlled environment of the lab or the clinic. “It’s sort of opening a window into parts of people’s lives we haven’t really had access to before,” says Ida Sim, a physician and informaticist at the University of California, San Francisco.

Once, peeking through that window required equipping subjects with elaborate motion and heart rate monitors designed specifically for research. But now that roughly two-thirds of U.S. adults own smartphones equipped with GPS systems, cameras, and light and motion sensors, “people are thinking, ‘Oh, well maybe I could just get [data] off somebody’s phone while it’s in their purse,” Sim says. “It’s bringing in a whole new group of people who are asking new questions.”

For example, researchers wonder whether they can finally discover just how much exercise—and what kind—makes for a healthy heart, and what strategies help people stop smoking for good. For many researchers, the hope is that mobile devices will allow them to go beyond collecting data to influencing behavior on a massive scale. Through activities built into an app or strategically timed alerts and messages, researchers can attempt to monitor and modify the habits of thousands of people simultaneously. Major university health centers and government funding agencies hope “mHealth” will finally make a dent in intractable public health problems, from obesity to tobacco use to depression. Sim, for example, collaborates on a 4-year, 11-university project funded by a $10.8 million grant from the U.S. National Institutes of Health (NIH), to design new analytical tools for interpreting mobile data and using them to combat disease. The team is already developing mobile technologies to help people manage congestive heart failure and quit smoking.

But harnessing the self-tracking trend to promote healthier behavior is far from a sure bet. The world of commercial self-improvement apps is “the world of the cowboys,” says clinical health psychologist Bonnie Spring of Northwestern University Feinberg School of Medicine in Chicago, Illinois. Commercial app designers “are really unbothered by the kind of standards of evidence that we care about,” says Spring, who studies behavioral treatments for obesity and tobacco addiction and collaborates on the NIH project. Few commercial apps have actually been shown to help change users’ behavior, improve their health, or even take accurate measurements.

Researchers hoping to bring rigor to the Wild West of mobile sensors are still wading through fundamental questions: Do the raw data from a phone or wearable device reliably measure behavior? Does getting feedback about their behavior really help people change it? And how do you keep the download-happy masses from quickly losing interest or ignoring your app? Sim says it’s hard not to let expectations about mobile health research soar beyond the evidence. “I think right now it’s still a lot of excitement, and a lot of hype.”

IN THE 1960s, walking clubs in Japan adopted a new fad: a commercial pedometer called manpo-kei, literally meaning “10,000-step meter.” Researchers were soon exploring the health benefits of the handy-but-arbitrary goal of 10,000 steps per day. Today, it’s the default goal on every new Fitbit.

But scientists still don’t know whether it’s the right exercise goal, says Euan Ashley, a cardiologist at Stanford University in Palo Alto, California. “Is it better to do vigorous exercise on weekends, or is it better to accumulate 10,000 steps a day? We don’t know,” he says. “It’s almost like we have something more powerful than any drug that we have for cardiovascular disease—physical activity—but we don’t know how to dose it.” He believes that the answer could lie in mobile health data.

Research that relates behavior to health has often relied on crude surveys that ask patients to remember and report what they’ve been up to. “‘What did you do on Monday? How many flights of stairs did you do on Tuesday?’ That’s literally how these studies are carried out,” Ashley says. “I can barely remember what I had for breakfast, never mind what I did last Wednesday.” Even big, successful, longitudinal studies like the famous 67-year-old Framingham Heart Study have relied on occasional surveys to spot correlations between behaviors and measures of health.

Other studies have taken people out of “the wild” for stints of close observation. Participants in sleep research may spend days or weeks in the lab, sometimes wired up with sensors or lying in magnetic resonance imaging scanners, for example. But the effort and cost of recruiting and compensating subjects makes large-scale studies impossible.

Mobile phones and wearable sensors offer a much cheaper way to get huge sample sizes … if they measure what they say they measure. “If we’re going to do science with these devices, we really want to validate them ourselves,” says Ashley, who is in the middle of that unglamorous task. He has rounded up all the major commercial fitness trackers to see how they compare with clinical grade equipment on their measures of heart rate and calories burned. His preliminary finding, which matches other recent studies, is that devices tend to agree on heart rate, but calorie counts are “kind of all over the place.”

Ashley is also experimenting with a new system for gathering health and activity information from iPhone users. He’s one of more than a dozen investigators who have launched apps using ResearchKit, Apple’s open-source software platform for scientists, unveiled in March. His team’s app, called MyHeart Counts, pulls data from phone accelerometers, which track daily step counts and can record participants’ performance on a 6-minute test of walking speed. The researchers can then explore how those readings correlate with participant-reported cardiovascular risk factors, diet, and mood. In its first month, the app recruited 30,000 participants, all of whom opted to share data through an informed consent form on their phones. By now more than 47,000 have signed up.

Ashley is just beginning to analyze the data, but his group is already developing a new version of the app. It turns the phone from a monitor into a coach, nudging participants to do more exercise.

Psychologists like Spring welcome such efforts to wrestle behavior-change strategies onto our tiny screens. They say that although pocket-sized counseling or coaching is unlikely to replace traditional in-person sessions, it might extend the reach of such interventions. “We know that the people we’re helping are in some ways those least in need—the ones who can afford to come, who have the time, who can pay for the parking,” she says.

In one early attempt, Spring and her colleagues designed an app that draws on principles of the Diabetes Prevention Program—a clinically tested curriculum that she calls “the most successful weight loss approach ever.” Central to that approach is getting participants to religiously count their fat and calorie intake and track their weight, Spring says, which can be a challenge. Hoping to keep users engaged, her group’s app depicts calorie and fat allowances in colorful meters that fill up over the course of the day.

Other researchers are testing apps meant to help recently abstinent smokers avoid relapse. A U.K.-based program called txt2stop simply sends tailored text messages, such as “Day4=Big day - cravings still strong? Don’t worry tomorrow will be easier! Keep your mind & hands busy.” Users can text the word “crave” at any time to get additional reinforcement, and “lapse” if they have smoked and need coaching through a slip-up.

That approach depends on users to reach out or engage with their phones when they’re tempted, however. “When people need the most help, they are not the ones likely to ask for it,” says Santosh Kumar, a computer scientist at the University of Memphis in Tennessee who leads the big NIH-funded project known as Mobile Data to Knowledge (MD2K). Ideally, he says, an app would sense the user’s context—including the presence of potential temptations—to figure out when someone needs guidance and then provide a so-called “just-in-time intervention.”

MD2K collaborators are working on one such system. It will infer stress—a known risk factor for a lapse in attempts to quit smoking—from heartbeat intervals in electrocardiogram data collected by a chest band. (Kumar notes that data from a smartwatch could also work, provided the watch’s heart monitor delivers reliable data.) The MD2K system will also detect when people smoke without their having to report it, by combining breathing patterns from the chest band with readings of arm motion gathered by a motion-sensing wristband. The hope is that stress-relieving exercises can be timed to moments when a person is most vulnerable to an urge or most receptive to encouragement.

Another context-dependent smoking app, called Q Sense, is under development in the lab of Felix Naughton, a health psychologist at the University of Cambridge in the United Kingdom. The app first uses a phone’s GPS system to tune into a person’s habits and learn where they are most likely to smoke—for example, in the pub or outside the workplace. Once people start to quit, they’ll receive tailored messages of encouragement when they breach a certain radius of these locations. Being in the workplace, for example, might trigger instructions for a stress-reduction technique.

DESPITE THE FLURRY OF RESEARCH, the first generation of behavior-change apps has a spotty report card. The U.K. antismoking app txt2stop showed some benefit in a randomized study of 5524 participants. It doubled the rate of successful quit attempts after 6 months—from about 5% in the control group to about 10% of text recipients. Although that may sound like meager progress, it’s cost-effective for health systems: The service costs £16,120 but gains about 18 life-years per 1000 enrolled participants.

Spring’s weight loss app, meanwhile, inspired an astounding level of self-tracking in its users. They entered their weight on more than 90% of days, she says. “I’ve never seen this. It was unimaginable.” But when she compared app users with people who tracked their weight and food intake with paper and pencil, the app seemed to provide no additional benefit in terms of pounds lost—both groups saw modest weight loss.

Spring suspects that self-tracking makes users more careful with their diets, but only to a point. Perhaps participants maxed out the benefit they could get from seeing their own data, so the app provided no advantage. If the researchers want more clinical improvements, Spring says, they’ll have to add in some other approach.

Other app studies have struggled to reveal any long-term benefits at all. A recent meta-analysis of 14 mobile weight inter-ventions found an average weight loss of only about 1.4 kilograms compared with control groups. And a 2013 review of 21 randomized controlled trials of mobile interventions for obesity, diabetes management, smoking, and other health challenges found that less than half led to improvements in a relevant measure of health.

“Frankly, this is so new that I’m not sure that we know that it works—that it makes a difference,” says Arthur Stone, a behavioral scientist at the University of Southern California in Los Angeles, who co-authored the review. In the 1990s, Stone was an early pioneer of real-time health tracking, having developed a method known as “ecological momentary assessment” that encourages participants to log their activity and describe their moods right when they experience them. The goal was to give researchers a more detailed picture of subjects’ psychological symptoms. But as smartphones take data gathering to its extreme, he finds himself among the skeptics. “Do we need the incredible density of data that we seem to automatically want to go to?” he wonders. “A lot of times, we’re measuring things because we can measure them, and we don’t know exactly why we’re measuring them.”

The new generation of just-in-time interventions faces other hurdles. In a recent feasibility study to learn how smokers would use his Q Sense app, Naughton found that about half the time, users didn’t open the app for more than 30 minutes after they received a notification. That means the intervention likely wasn’t reaching people at the intended moment.

The question of how and when a phone should interrupt a person has become a field of study in itself. Computer scientist Veljko Pejović at the University of Ljubljana and colleagues have tried to model users’ “interruptibility” by gathering their feedback about messages and alerts at various points in the day. So far, his results can’t offer a generalizable strategy. “It’s very personalized,” he says. People may engage with or ignore a message based on their location, the time of day, the kind of activity they’re involved in, and whether they’re starting or finishing a task.

The MD2K team worries that users won’t be able to focus on an alert when they need it most: at times of stress. So in January they’ll launch a new study of their system, involving 75 smokers, which will be “microrandomized.” A given user will sometimes receive an alert telling him or her to do a stress management exercise at moments of high stress; at other times, the alert will arrive when stress is deemed low. Combined with records of smoking from wearable sensors, the data might reveal which strategy has the greatest impact.

It may turn out that no one behavior-change strategy will work for everyone. The MD2K team, for example, plans to eventually personalize the timing of alerts for each participant. As apps bring in richer data about each individual user, it’s becoming clear that “what predicts behavior in groups doesn’t necessarily predict behavior in individuals,” Naughton says.

Rosalind Picard, a computer scientist at the Massachusetts Institute of Technology (MIT) in Cambridge, is developing highly personalized interventions that would be sensitive to users’ state of mind.

ADAPTED FROM SNAPSHOT STUDY/MIT MEDIA LAB AFFECTIVE COMPUTING GROUP/HTTP://SNAPSHOT.MEDIA.MIT.EDU/ C.SMITH/SCIENCE

Picard was motivated by a tragedy: In 2013, she learned that a former graduate student had taken his own life. Her team started thinking about how wearable sensors could relieve stress and prevent depression. “It’s one thing to study all this,” she says. “It’s another to build it into a form that people can start changing their lives around.” With support from a memorial fund organized by her former student’s mother, Picard’s lab has begun studying work-related stress and strategies for relieving it.

A first step, published this year, tracks a group of MIT undergrads over 30 days, collecting data from wearable wrist sensors on movement, skin conductivity, and temperature, as well as smartphone records of location, calls, and text messages. The team then related these measurements to self-ratings of stress, health, energy, alertness, and happiness (see graph, above). Initial insights were far from shocking: Spending extra time outdoors and getting ample, consistent sleep were among the factors predictive of happiness, for example.

But the technology Picard envisions down the road is more elaborate: a system that trains itself to forecast an oncoming anxiety attack or a bout of depression, based on sensor-derived signals that are unique to an individual, and that alerts wearers when they might be in trouble. For example, if a person’s sensor and mobile data showed that she was sleeping irregularly or using her phone late at night at times when she felt down, the system might automatically send a reassurance or suggest that she get more sleep.

It remains to be seen whether such a system could be reliable, not to mention how government regulators would view it. “We’re kind of where weather forecasting was 150 years ago. People looked at the farmer’s almanac, and then the city got wiped out that night by a storm and they didn’t see it coming,” Picard says. But she thinks the technology is now good enough for researchers to think about predicting human behavior. “It’s not as good as weather forecasting yet, but it’s better than random.”