How Prague researchers co-authored a revolutionary AI in Alberta that became the first to beat professionals at Texas hold’em
Martin Schmid, Matej Moravčík (PhD students at Department of Applied Mathematics, Charles University) and Viliam Lisý (Assistant professor, Czech Technical University) are three of several authors of a recent paper published in this month’s edition of Science reporting on the development of a revolutionary AI called DeepStack which they helped design at the University of Alberta.
In this interview I ask all about how the team did it. Highlights: how the team in Alberta just pipped Carnegie Mellon University in the US which saw similar success with their own AI soon after.
Excerpts from our interview, taped at Charles University:
It's a pleasure to meet you all, all three of you were involved in this project together with the University of Alberta. The AI is called DeepStack and this is an AI that knows how to play - not only to play - but how to win at the game of Poker (Texas Hold'em). In terms of the background, how did this project come together and how long was it in the making?
Martin Schmid: "Matej and I were working on poker for eight years and the people at the University of Alberta for the past 20 or so. We all knew each other from different AI conferences and this particular project was launched last March, when we joined the university's computer research team and worked on the DeepStack algorithm for the next year."
Games have a special place when it comes to artificial intelligence: in the past programmed AI has beaten chess masters, IBM designed Watson that won on Jeopardy! What is the difference when it comes to a game like poker?
Matej Moravčík: "The biggest difference is that in poker you have imperfect information. You know what cards you are holding but not your opponent and you both know what is on the table. The fact that your information is limited presents much more of a challenge but is also more realistic than situations where everything is known."
In a game of chess, you see all the pieces on all the spaces. You can observe the game state at any given moment and make decisions on perfect information you can see. But of course a computer or AI can make many more and much faster calculations which gives it an advantage even over chess masters, correct?
MM: "That's it exactly."
How do you train an AI to bluff? To work with imperfect information...
Viliam Lisý: "It was the culmination of some 50 years of game theory and the main idea for the algorithm that we eventually used was that we improved the strategy for 'both' players at once using self-play. Imagine that the AI is playing poker against itself many, many times, billions of games and in each game it tries to learn from experience what was a good move, what was a bad move. And bluffing, actually, is just a consequence of the rules. Because in poker if you always bet only when you have strong cards, your opponent learns early when to fold. But f you want to play optimally, you sometimes have to bet with strong cards and sometimes with bad cards so that you hide this information.
“[DeepStack] in a way was the culmination of some 50 years of game theory.”
"We gave the computer the exact rules but it had to learn how to play the game on its own. We didn't give it any advice, and sense of good or bad moves or how to bluff. We just gave it the rules, let it play itself, and come up with a strategy on its own."
Does that mean that what it learns in this game can be applied in other areas as well?
VL: "It can definitely be applied in other games: what it learned is not important but how it learned. And that could definitely be modeled to any other scenario where there is imperfect information in a game. But it is not a general purpose AI where we would want it to learn path planning for robots and for the same AI to a slew of other things. It's specific to imperfect information games."
You said that this success was built on many years of research. In the case of this project, were there setbacks or dead-ends when you weren't sure whether it would work out the way it did?
Martin Schmid: "I was personally surprised by how smoothly things went, especially given the ambitious nature of the project. In total, 10 people worked on DeepStack for almost a year, so that gives you an idea of how intense it was."
Matej Moravčík: "We began the first coding last March and the matches with human players was in November. We did some preliminary testing before that with some in-house players to check if the strategy of the AI was reasonably strong."
And the first signs were good...
MM: "Yes. The team in Alberta had some professionals who had tested their best bot every year or two for about 10 years and the answer from the players was always 'Not yet'. Last year it was different and it was the first time that they were happy with the strategy."
Is it a genuine milestone for a poker player, a professional poker player, to lose to an AI?
Viliam Lisý: "It is definitely a milestone that that happened and it is even more: the milestone is also in the algorithm which we developed because it is really substantially different from any other algorithm used for imperfect information games before. We believe that it brings really different applicability to these game models because we can use much smaller computational resources to come up with such strong strategies. So, our strategy can run on a simple gaming machine on one GPU. And that was impossible before.
"Previously, similar results were thinkable and one of the results you are probably referring to was by a huge supercomputer which was necessary for doing things the old way. Now with the the new algorithm we really reached a new milestone which allows us to use imperfect information games in a far more efficient computational way."
“We gave the computer the exact rules but it had to learn how to play the game on its own.”
So we will be able to try that shortly?
VL: "Maybe..." (Laughter)
It must have been a lot of fun when it started beating players and the realization set it in the project was a success!
Martin Schmid: "Unfortunately, there wasn't much time to celebrate once that was achieved because we were rushing quite a bit to get the results published. We were writing quite quickly, doing additional experiments because we knew we were also racing against the clock. A similar project by Carnegie Mellon University, which you mentioned earlier, was also rushing to get a result and we wanted to be the first."
I think theirs was in January?
MS: "Yes. It is important to say that their Libratus AI did beat four top poker players in January. But we did it in November/December. But we couldn't talk about it. We just published in Science AAAS and the embargo is very strict when publishing with that magazine. Second, as Viliam mentioned, the breakthrough here was also that the algorithm used was a breakthrough. That is why this was published this month in Science. Libratus is a great achievement but the algorithm they used is more evolutionary than revolutionary."
Would you explain the difference?
"DeepStack and Libratus are surprisingly similar when it comes down to the end of the gameplay. In poker there are four rounds you play in which cards are dealt and when the game is almost over both systems use something which is called endgame solving to try and figure out the end of the game. It's nice that both teams came up with an algorithm in the endgame that works similarly. But if you go to the beginning of the game, what DeepStack does is that it uses neural networks to predict what will happen later in the game. Libratus uses the rather older technique of abstraction, where it solves everything that can happen in advance, on a smaller abstracted game. Then it tries to map a solution from the smaller abstracted game while it plays."
Now, when you say that it beat a person for the first time, how many times did a particular player play against it?
Viliam Lisý: "That is a very important part of the study. We really tried to design it so it is all statistically significant. We approached the International Poker Federation to provide us with some players and they did, 33 players from 17 different countries. We asked them to each play 3,000 hands, which may seem like not that much - because there is a large amount of chance in poker - but we also developed a very sophisticated variance reduction technique that allows us to tell apart luck... and actual skill in the game. So even with such a number of hands we were able to decide about 10 out of 11 players who completed the number of hands that the AI had beaten them with statistical significance and we beat the last one as well but the confidence interval of the result was too wide to say that we had beaten him with me than 95 percent probability. For him it was less, for everyone else, at least 95 percent."
“The milestone is not just that it won, but that the algorithm is substantially different from any other used for imperfect information games before.”
We have the laptop here - can I try it?
VL: "Ok, so those are DeepStack's cards and we don't see those, of course. And these are yours, which the algorithm doesn't see. And in the center are the public cards. And this is the current size of the pot. The pot has been raised by 900 chips so we can fold or call and add another 600 chips or raise."
If you raise, that's an indication for anyone, AI or human, that you are feeling pretty "confident" about your hand, which we might not want to give away the truth about at this point because together with my hand I have two fives. Not terribly strong! But still: a pair beats high card.
VL: "Yes."
What would you do then?
VL: "I'm not that good a poker player, none of us is, actually... So we can just call, let's say we check." (Laughs)
Ok.
VL: "Now it is DeepStack's turn and he..."
Has raised another 2,400...
VL:"So we can re-raise and see if it folds."
Ok.
VL: "Now it is your turn again, what should we do?"
I guess 'check'.
VL: "OK... And the bot went all-in."
Oh no!
VL: "And you have just a pair of 5s."
If we fold we don't get a chance to see... Let's call.
VL: "So now we find out and he... the AI was bluffing actually.
So we won! With a pair of fives! That's better than I usually do.
VL: "Yeah! That's poker. You need to bluff to be optimal so... This time we were lucky, next time we won't be!"