The health care bill winding its way through the U.S. Senate is just one of thousands of pieces of legislation Congress will consider this year, most doomed to failure. Indeed, only about 4% of these bills become law. So which ones are worth paying attention to? A new artificial intelligence (AI) algorithm could help. Using just the text of a bill plus about a dozen other variables, it can determine the chance that a bill will become law with great precision.
Other algorithms have predicted whether a bill will survive a congressional committee, or whether the Senate or House of Representatives will vote to approve it—all with varying degrees of success. But John Nay, a computer scientist and co-founder of Skopos Labs, a Nashville-based AI company focused on studying policymaking, wanted to take things one step further. He wanted to predict whether an introduced bill would make it all the way through both chambers—and precisely what its chances were.
Nay started with data on the 103rd Congress (1993–1995) through the 113th Congress (2013–2015), downloaded from a legislation-tracking website call GovTrack. This included the full text of the bills, plus a set of variables, including the number of co-sponsors, the month the bill was introduced, and whether the sponsor was in the majority party of their chamber. Using data on Congresses 103 through 106, he trained machine-learning algorithms—programs that find patterns on their own—to associate bills’ text and contextual variables with their outcomes. He then predicted how each bill would do in the 107th Congress. Then, he trained his algorithms on Congresses 103 through 107 to predict the 108th Congress, and so on.
Nay’s most complex machine-learning algorithm combined several parts. The first part analyzed the language in the bill. It interpreted the meaning of words by how they were embedded in surrounding words. For example, it might see the phrase “obtain a loan for education” and assume “loan” has something to do with “obtain” and “education.” A word’s meaning was then represented as a string of numbers describing its relation to other words. The algorithm combined these numbers to assign each sentence a meaning. Then, it found links between the meanings of sentences and the success of bills that contained them. Three other algorithms found connections between contextual data and bill success. Finally, an umbrella algorithm used the results from those four algorithms to predict what would happen.
Because bills fail 96% of the time, a simple “always fail” strategy would almost always be right. But rather than simply predict whether each bill would or would not pass, Nay wanted to assign each a specific probability. If a bill is worth $100 billion—or could take months or years to pull together—you don’t want to ignore its possibility of enactment just because its odds are below 50%. So he scored his method according to the percentages it assigned rather than the number of bills it predicted would succeed. By that measure, his program scored about 65% better than simply guessing that a bill wouldn’t pass, Nay reported last month in PLOS ONE.
Nay also looked at which factors were most important in predicting a bill’s success. Sponsors in the majority and sponsors who served many terms were at an advantage (though each boosted the odds by 1% or less). In terms of language, words like “impact” and “effects” increased the chances for climate-related bills in the House, whereas “global” or “warming” spelled trouble. In bills related to health care, “Medicaid” and “reinsurance” reduced the likelihood of success in both chambers. In bills related to patents, “software” lowered the odds for bills introduced in the House, and “computation” had the same effect for Senate bills.
Nay says he is surprised that a bill’s text alone has predictive power. “At first I viewed the process as just very partisan and not as connected to the underlying policy that’s contained within the legislation,” he says.
Nay’s use of language analysis is “innovative” and “promising,” says John Wilkerson, a political scientist at the University of Washington in Seattle. But he adds that without prior predictions relating certain words to success—the word “impact,” for example—the project doesn’t do much to illuminate how the minds of Congress members work. “We don’t really learn anything about process, or strategy, or politics.”
But it still seems to be the best method out there. “Nay’s way of looking at bill text is new,” says Joshua Tauberer, a software developer at GovTrack with a background in linguistics who is based in Washington, D.C., and who had been using his own machine-learning algorithm to predict bill enactment since 2012. Last year, Nay learned of Tauberer’s predictions, and the two compared notes. Nay’s method made better predictions, and Tauberer ditched his own version for Nay’s.
So how did the new algorithm rank the many (failed) bills to repeal the Affordable Care Act? A simple, base-rate prediction would have put their chances at 4%. But for nearly all of them, Nay’s program put the odds even lower.