Researchers have developed a poker-playing computer program that can defeat even the best human players.

Researchers have developed a poker-playing computer program that can defeat even the best human players.

(Illustration) Peter and Maria Hoey/www.peterhoey.com

Texas Hold ’em poker solved by computer

Card sharks, beware. A new program cannot be beaten at a variety of poker called heads-up limit Texas Hold ’em—at least in a human lifetime—a team of computer scientists reports. Researchers had previously developed unbeatable algorithms for other games such as checkers, but the new work marks the first time scientists have found such an algorithm for—or "solved"—a complex game in which some information about the state of the game (i.e., the cards in his opponent’s hand) remains hidden from the player. The program has yielded insights that could help players improve their game, and the general approach may have real-world usefulness in security and health care applications.

Because of the hidden information and the luck of the draw, the program won't necessarily win every hand, explains computer scientist Michael Bowling of the University of Alberta in Edmonton, Canada, who led the study. But on average the program is so good that a human would have no chance of ever edging ahead of it, even if the two played 60 million hands. So “for all purposes that anyone would ever care about, we’ve solved the game,” Bowling says.

Some games are easier to solve than others. For example, in tic-tac-toe even a child can learn to force a draw every time. In contrast, it took computer scientists years and plenty of computing power to solve checkers. And either of those games is much simpler than poker for a number of reasons. In both tic-tac-toe and checkers, both players have full knowledge of the state of the game at every turn. In poker, players cannot see each other's cards. And unlike tic-tac-toe and checkers, poker involves luck, betting, and bluffing, factors that make it impossible to find a strategy that guarantees a win or a draw on every hand.

In fact, poker is so complicated that Bowling and colleagues decided to study only a relatively specialized version called heads-up limit Texas Hold ’em. In it, only two players compete and the size of bets is limited. To begin, each player places a bet and is dealt two cards. Three cards—the flop—are then laid face-up in the middle of the table. Then two more cards are played face-up on the table. Each player then tries to make the best five-card hand—say, three of a kind—from his own cards and those on the table. After each round of cards, a player can check, bet, match his opponent’s bet, raise that bet, or fold. During each round of betting, a player must at least match his opponent's bet to stay in the game. In the end, if no one folds, the better hand wins the pot.

The researchers developed their strategy by pitting the computer against itself in a series of training rounds. After playing itself, the computer examined its moves to see if making different choices would have improved its result. It then calculated its "regret" for not doing so—a mathematical measure of how much it lost because of its imperfect move. As the computer "practiced" against itself, it improved its strategies, and its regrets gradually diminished. In a solved game, those regrets would be zero because each move would be perfect. After training their algorithm, the computer's regrets were so close to zero that the game couldn't be beaten in a human lifetime, the researchers report online today in Science.

In this way the computer calculated a vast table of strategies for each possible action in a game. For every hand, the computer can look up whether it should fold or bet. Given the same hand, the program will not always take the same action, but instead will bet a certain fraction of the time and fold a certain fraction of the time. The program can even bluff—given a weak hand, the program will usually fold, but occasionally bet. Bluffing, it turns out, has a mathematical basis and can be optimized just as other actions can.

Technically, the not-quite-zero value of the regret function the researchers achieved means that the game hasn't been exactly solved and that an even better program could be found. But the strategy is so good that it's essentially pointless to keep looking for a better algorithm, says computer scientist Murray Campbell of IBM’s  Thomas J. Watson Research Center in Yorktown Heights, New York, who did not work on the program. In poker, "you can never get the exact, perfect solution, but you can get so close that nobody could ever tell the difference."

Phil Laak, a professional poker player based in Los Angeles, California, who has played against an earlier program from Bowling’s group, says that programs like this one are useful tools for professionals. Such programs, he says, can only improve the game and not, as some might worry, take the joy out of it. "Poker somehow grabs the imagination, and it has a romance attached to it that I think will forever exist," he said.

In fact, the program may already be providing insights into the game. The program plays a larger range of hands than professional players do, making bets with weak hands that professional players tend to fold. It has also confirmed the conventional wisdom that the dealer in each round holds an advantage. But although the new strategy can never lose, it may not maximize winnings in all situations. When playing a weak player, the strategy will be too conservative to rake in the biggest possible winnings.

Although the study of poker may seem like just fun and games, advances in game theory can have real-world applications in areas such as airport security, coast guard patrols, and health care, in which people must make decisions using the limited information available to them. “I think this is an exciting step that this paper makes, and I think it’s part of a broader development” in such algorithms, says Vincent Conitzer, a computer scientist at Duke University in Durham, North Carolina. “More and more we’re able to apply them directly to real-life games, whether they be poker or these kinds of strategic situations that come up in security.”

Follow News from Science

A 3D plot from a model of the Ebola risk faced at different West African regions over time.
dancing shoes