Libratus: Will Robots Take Over the (Poker) World?
2017 may well be seen in the poker world as the year of the bot. In January of this year, an artificial intelligence (AI) called Libratus built by researchers at Carnegie Mellon University managed a feat long thought impossible with existing technology: It beat seasoned pros in heads-up no-limit hold’em. In retrospect, this win by AI is just the latest in a string of fairly stunning successes by AI in game-playing, an area that, at the highest levels, was once thought to be distinctly human.
The trend began in 1996 when an IBM-developed computer called Deep Blue did the unthinkable: defeating Garry Kasparov, the world’s top chess player at the time, in a single game to start a six-game match. While Kasparov managed to go on to win the match 4-2, an upgraded Deep Blue beat Kasparov 3.5-2.5 the following year in the six-game rematch after the computer capitalized on an early mistake from its human opponent.
Prior to that defeat, the domain of games was seen as a human domain. It was well-established that computers were a great gaming platform, but generally believed that the kind of insight and skill required to play games like chess at the highest level was something humans would always excel at. Kasparov’s loss showed that artificial intelligence could, potentially, supplant humans at the art of game-playing.
Chess is a relatively simple game, however. With a limited number of possible moves, it was always a game that was eminently solvable by machine intelligence. While the total possible moves in chess is a very high number from our human perspective, it represents a relatively manageable data set for machine intelligence, especially in today’s world of high-speed data operations. While the processing power required for Deep Blue in 1996 was a super-computer, improvements over the intervening two decades have made chess solvable in a fairly consistent way.
The next big challenge for gaming computers was the ancient Japanese strategy game gomoku which, like chess, is a somewhat stylized battle game. The total number of possible gomoku games dwarfs the set of chess games with total chess games coming in at 10^120 while gomoku has 10^170 possible games (on a 19x19 board). Even after the victory of Deep Blue over Kasparov, gomoku at the highest level was still seen as a game too complex for a computer to ever best a top human opponent.
That all changed in 2016 when AlphaGo, a deep-learning neural net developed by Google, defeated Lee Sedol in a five-game match with a score of 4-1. Sedol is a nine-dan gomoku master who is considered one of the top players in the world, and the victory drove home the dominance of artificial intelligence in strategy games involving complete information.
In poker, incomplete information is the name of the game.
Poker had always been seen as something different than other games like chess and gomoku. In most other games, players have complete information about the current state of the game. At any given moment, both players in a chess match have access to all the information about the game to that point: past moves made, current board layout, etc. There is no hidden information about an opponent’s past or current moves. In poker, however, incomplete information is the name of the game.
There is no point during a poker hand where any player has all the information. Each player’s hole cards are hidden from the rest of the players, meaning that decisions we make as players are never based on the same level of information that a chess or gomoku player has. Instead, poker players are forced to conjecture what their opponent’s holdings are while simultaneously trying to misrepresent their own.
This feature of hold’em poker makes machine-based solutions very tricky to implement, because misrepresentation is difficult to produce in a traditional solution tree that examines all possible moves. However, even that was overcome by a University of Alberta bot called Cepheus that essentially solved the limit version of heads-up hold’em in 2015.
The final frontier of no-limit hold’em was crossed in early 2017 when Libratus defeated four poker pros (Jason Les, Dong Kim, Daniel McAulay, and Jimmy Chou) at heads-up no-limit hold’em. Libratus won the matches handily, with the human players finishing down a total of $1,766,250 against it.
The no-limit variety is seen as the most significant challenge in computer-based poker play, since betting is unlimited. In the solved limit version of the game, players can only bet a fixed amount, meaning there are only three choices at any given time: fold, call, or raise. In no-limit, while those three choices are still ostensibly the same, the raise option has far more possibilities in no-limit, since the raise can be any amount (within a few general guidelines such as blind sizes). The AI has to be able to develop strategy for a raise of two big blinds, and one for a raise of three big blinds, as well as every possible increment in between and up to an all-in bet.
Between the unknown information and the increased betting possibilities, using brute force computing to solve no-limit hold’em was always going to be a losing proposition. As with a game like gomoku, the total data set for a brute-force solution is unrealistic, even in today’s computer world, and the added difficulty of incomplete information makes the brute-force solution impossible for no-limit hold’em.
Instead, new game-based AI algorithms are using an iterative process of machine learning to achieve their results, called counterfactual regret minimization, or CFR. When they are first turned online, algorithms like Libratus have little more than the basic rules programmed into them. From there, they play trillions of hands against other instances of themselves, building a data set of situations and responses from which a strategy emerges. Games against itself are often supplemented with training from games against live human opponents.
The CFR method is excellent at "teaching" itself the game.
In essence, rather than feeding the algorithm a huge data set of all possible games, the algorithm builds that data set itself through iterative play, highlighting situations and strategy that are successful. "We give the AI a description of the game. We don't tell it how to play," explained Noam Brown, a CMU grad student.
The CFR method is excellent at "teaching" itself the game, but it still builds an unmanageable database of situations. To solve that, the researchers added a second layer of AI called the "end-game solver" that helped identify the best solutions in the moment. The end-game solver analyzes the current state of play and hones the attention of the main CFR algorithm to the best solutions available.
And, as if in a nod to the real world of poker pros, the final product contained an added cherry on top. At the end of each day of play, researchers ran the day's play through a pattern analyzer, looking for areas that might be exploitable in order to focus its play for the next day. Most poker players can respect the process of analyzing your play to look for weaknesses both in your own play as well as those you are playing against, and it's fascinating to see AI researchers creating the machine version of essentially the same process as their final way to balance the play of their AI.
No human has the ability to do that.
The combination of methods produces some surprising results from the perspective of human opponents. David McAulay noticed something interesting while the algorithm was taking his money. Quoted in Wired, McAuley said the bot “splits its bets into three, four, five different sizes ... No human has the ability to do that.”
There is still one hill left to climb in the human-versus-computer poker war: full-ring no-limit hold’em. To date, the algorithms in question can only develop effective strategy in the heads-up variety of the game, which obviously has far fewer permutations to deal with than a full-ring game. It’s likely that new strategies will need to be designed before that hill is taken, since the current learning strategies are largely based around two-person games.
Libratus has clearly shown, however, that no-limit hold’em is a game that can be mastered by computers, even if only in its most simple heads-up form at the moment, and the push over that final hill is only a matter of time, computing power, and algorithm optimization. With improvements in the end-game solver (or the addition of more end-game solvers focused on each player at the table), the day is in sight when a bot will be able to beat the best at full-ring no-limit hold'em. One of the last bastions of human dominance, games of chance with incomplete information, may soon be falling to the machine overlords of the future. Hold on to your wallets.