Read our COVID-19 research and news.

Immigrant advocacy groups have been encouraging hard-to-reach populations to participate in the 2020 census. Here, a volunteer knocks on a door in Maryland.

Marvin Joseph/The Washington Post via Getty Images

Can the Census Bureau actually meet Trump’s demand to identify noncitizens?

The U.S. courts and Congress might be the ultimate arbiters of the fate of President Donald Trump’s new order to exclude undocumented residents from the 2020 census tally now underway. At stake is how many seats each state gets in the 435-seat U.S. House of Representatives, a decision that could shape the country’s political landscape for the next decade.

But demographers and statistical scientists also have a dog in the fight: the integrity of the Census Bureau. Researchers worry the bureau’s effort to satisfy the president’s wishes could sully its gold-plated reputation by assigning it a task no statistical agency can carry out.

The agency’s job is to conduct a complete and accurate census and report out the results. But Trump has ordered “the Census Bureau to develop numbers [for undocumented residents] that are almost impossible to determine,” says Rob Santos, incoming president of the American Statistical Association and chief methodologist at the Urban Institute. “Nobody knows how to do that.”

The president’s 21 July memo follows up on a July 2019 executive order that gave the Census Bureau a different, but related, assignment: Figure out how many residents are U.S. citizens. The agency is supposed to do that by using existing government records, such as Social Security, unemployment, and motor vehicle information. Both directives are a response to a June 2019 ruling by the Supreme Court blocking the Trump administration’s attempt to get the answer by asking about citizenship on the 2020 census.

Census is already facing huge challenges in completing the decennial census, which officially began on 1 April and will cost an estimated $15 billion. The coronavirus pandemic has delayed deployment of an army of enumerators to track down those who have yet to answer its 10 questions. But that massive follow-up exercise could be compromised if residents are afraid of contracting COVID-19 from a stranger who knocks on their door.

Demographers say some of those households are already on edge, fearing the citizenship question is part of the larger push by the Trump administration to crack down on illegal immigration. This spring, officials reported a lower-than-expected response rate among traditionally hard-to-count populations. As a result, more aggressive follow-up efforts than normal will be needed this fall to avoid a serious undercount.

A turn to administrative records

Trump’s 2019 executive order to the secretary of commerce, who oversees the Census Bureau, was widely seen as an attempt to get around the Supreme Court’s ruling by obtaining citizenship data through government records. The new memo goes a step further. In essence, it orders Census officials to subdivide the category of noncitizens into two pots: those who are in the country legally and those without proper documentation. The commerce secretary would then give the president a revised tally of each state’s population that excludes undocumented residents for purposes of apportionment.

Many legal scholars believe subtracting undocumented residents from the apportionment count violates a constitutional requirement to count every resident, regardless of legal status. Moreover, many statisticians believe that generating accurate counts for all three citizenship variables—citizens, documented immigrants, and undocumented residents—is a bridge too far for the agency.

The Census Bureau’s chief scientist, John Abowd, “has said that he can count the number of citizens,” Santos notes. But that’s not what Trump is now asking for. “[Abowd] has never said that he can count the number of undocumented residents,” Santos says. “And he’d be crazy to promise that.”

Separating out noncitizens

Documented versus undocumented might sound like a subtle distinction. But it’s actually a big deal, says economist Amy O’Hara, who oversaw the agency’s efforts to use administrative records before joining the faculty of Georgetown University in 2018.

None of the existing government records was designed to do what Trump wants, O’Hara notes. Some don’t contain current citizenship status, for example, whereas others don’t say where the person is living. (The decennial census assigns each person to a specific address when it includes them in the overall count.) The revised tally to be used in apportionment must be delivered by the end of the year, and O’Hara says it’s hard to see how the Census Bureau could obtain, vet, and generate numbers that meet its rigorous standards for accuracy by that deadline.

“I don’t know what set of data sources the Bureau could identify for that purpose,” O’Hara says. “And for the ones they have, it’s not clear how they would operationalize them.”

Another vexing problem is that even records that include citizenship status don’t necessarily distinguish between those in the country legally and undocumented residents. “To produce a good number, you need to be able to draw a clear line between those two categories [documented and undocumented], based on transparency and outside scrutiny,” O’Hara says. “But that sharp definition doesn’t exist” in the administrative records available to the Census Bureau.

Government agencies that track individuals coming into the country don’t necessarily update their records when a person shows up in the files of another agency responsible for a different component of the country’s complex immigration system. As a result, Santos says, existing records are likely to low ball the estimated number of people holding legal status.

Those without proper documentation, Santos says, face “a double whammy” that stems from both their hard-to-count status in the census and their spotty paper trail.

The Census Bureau is using Spanish-language yard signs to reach some people who have yet to fill out the 2020 census.

J. Mervis/Science

Here’s his scenario: The census will first fail to count many undocumented residents, he predicts. Then, when the bureau tries to estimate the number of noncitizens for purposes of apportionment, it will put too many people in the “undocumented” category.

In practice, such a flawed tally could mean fewer House seats for states that have large immigrant populations, such as Texas. Conversely, it could benefit states whose initial census tally is close to the mark and that are determined to have relatively few undocumented residents.

Tweaking the numbers

A further complexity is that the decennial census never hears from every U.S. resident. For those it misses—knowledge based on a master list of household addresses—it uses a process of filling in the blanks known as imputation.

Imputation is based on knowing enough about the likely characteristics of residents who failed to respond to create a demographic profile—with age, sex, and race, as well as address—for each one of them. The imputation rate for the 2010 census was 2%, for example. But the unique environment for this year’s census could take officials into uncharted waters in applying that technique.

One worry is that the undercount could be historically high because of the pandemic and other factors. “Imputing [the characteristics for] even 5% of households would be unprecedented,” O’Hara says. An additional problem is that the demographics of those not responding this year might be different than in past censuses, raising questions about the accuracy of the underlying assumptions.

Privacy complications

Imputation isn’t the only technical challenge facing Abowd’s research team. They are also responsible for ensuring that none of the massive amount of demographic information the agency releases from the decennial census can be used to identify individuals.

Traditionally, the way to guarantee anonymity is to add statistical “noise,” that is, to alter data—a person’s age or the size of the household, say—collected from the smallest subgroups while retaining accuracy at larger scales. The 2020 census will use a new mathematical approach, called differential privacy, to achieve that goal.

Census officials say it provides absolute protection against the type of reverse engineering that can be used to tease out personal identities from massive databases. It does so by setting a privacy “budget” that balances the distortions needed to ensure privacy against the granularity of the data needed to answer questions posed by demographers.

The Census Bureau has been testing its new disclosure avoidance system (DAS) using data from the 2010 census. Those trial runs have revealed the DAS can introduce unwanted deviations, and that data on less-populated areas and among certain demographic groups are especially prone to inaccuracies. It has also generated some impossible results, such as negative population counts for some of the smallest census units, which correspond to a handful of city blocks, or a lack of elderly residents in an area where a nursing home is located.

None of that is surprising, the Census Bureau’s Matthew Spence explained during a recent webinar hosted by the Committee on National Statistics of the National Academies of Sciences, Engineering, and Medicine. Census officials always planned to follow an iterative process in working out the kinks in the privacy system, he says. New methods, he added, including running the data through multiple iterations before spitting out the results, have yielded numbers much closer to the actual results.

But it’s not clear whether the Census Bureau will use the same iterative process in following Trump’s directive in calculating the number of undocumented residents and removing them from the total count obtained from the 2020 census. Census officials haven’t said yet whether differential privacy will be applied in coming up with the revised state population totals needed for apportionment, nor if there will be time to do so. And that’s another red flag for statisticians.

“The privacy budget will need to be spread over all three categories of citizenship variables, and for 100% of the records,” O’Hara says. “And they will be doing it using an untested methodology.”

Will the count stand?

Several statisticians who were reluctant to speak publicly told ScienceInsider that they are rooting for the Supreme Court to torpedo Trump’s effort. “That could make all of this moot,” one says.

But if that doesn’t happen, or if the Trump administration is able to force Census to release numbers before any judicial ruling is handed down, O’Hara says the presidential memo itself contains a potential loophole.

“It says that ‘the Secretary shall take all appropriate action, consistent with the Constitution and other applicable law …’ to provide the necessary information to the president,” she notes. “So, I’m wondering if, in this challenging environment, the [Census Bureau] director would stand behind data obtained in this questionable manner. Or would he decide that they don’t meet Census Bureau standards and that he cannot transmit them to the president, as directed?”