Go, AI and Game Theory

hqdefault I haven’t played Go but have been reading about it over the last little while since it turned out that an AI learned to beat the world champion in it. The significance of this is that those who knew the game of Go believed that it was out of a machine’s reach to learn to play it better than the best human. This is because, unlike chess, brute computational force was not a competitive advantage. Go is too complex and, thus, there is an art to playing it.

Even though I don’t know much about the game, it is interesting to read accounts of how DeepMind’s AlphaGo was able to defeat Lee Sedol.

At first, Fan Hui thought the move was rather odd. But then he saw its beauty.

“It’s not a human move. I’ve never seen a human play this move,” he says. “So beautiful.” It’s a word he keeps repeating. Beautiful. Beautiful. Beautiful.

There were no complaints about the unfair computational ability here. Instead, an appreciation of something new — in my reading, a new strategy. In other words, an innovation.

How did the innovation arise? For it to be one, no human knew of it beforehand including those ‘programming’ (if that is the right word now), AlphaGo. This seems to be the case. Instead, AlphaGo learned to play it by being fed the data for thousands of games including those presumably played by Lee Sedol. Interestingly, this is a similar approach to how top Go players are trained. But AlphaGo was also learning as it played Fan Hui, the European champ it defeated a few months ago. Interesting Fan Hui was learning too and through their interaction his game has dramatically improved from 633 to being in the 300s. In other words, AlphaGo had the corpus of knowledge of past Go games and their players but was being trained by someone who was far from the best. It was not like it was the number 2 player in the world or a Top 10 player.

This achievement is monumental but the game theorist in my is still unsure if that is it for human Go players. There are things we don’t know. For instance, AlphaGo may have trained to play one person, Lee Sedol, and may well lose to others. Fan Hui has defeated Go in unofficial games. In particular, how would AlphaGo go against people who made more mistakes? My point here is that AlphaGo may have been trained to know who it is playing but what happens when it doesn’t know that. Game theory tells us that your tactics will change depending upon who you play against (just think about the scissor’s biased people in Rock-Paper-Scissors). A human Go champion knows what they are up against but what as AlphaGo know. My assumption here is that it may have known too much. (By the way, this is perhaps the reason why the best chess players in the world combine a team of humans with an AI rather than an AI alone).

This suggests some other implications. First, it would be interesting to test AlphaGo playing when it did not know the identity of the player or where the players may switch while the same AI entity is playing. Second, AlphaGo is a new serious player but this does not necessarily imply that it or its descendants cannot be defeated. Instead, there is a new set of learning that needs to occur. AlphaGo has innovated but we do not know if that innovation can only be carried out by an AI or if it can be imitated and understood by human players. Third, because Go is so complex, that means that learning from it involves a different stock of knowledge and set of experiences for each AI that attempts it. That means AI’s will play each other. Apparently, AlphaGo learned this way but was it playing itself or a different AI that had learned to play the game independently? The difference may matter.

Finally, and this is an important one. AlphaGo got where it did because it stood on the shoulders of human giants. In other situations, it has learned from scratch and played other computers (as in DeepMind’s mastery of Space Invaders). Thus, at the moment, AI is not able to learn something like this independent of 2,500 years of knowledge accumulated by humans. It will be a different matter of an AI can learn independently and defeat humans without ever seeing a human play.

This story isn’t over yet.

3 Replies to “Go, AI and Game Theory”

Henry 3 Dogg says:

March 13, 2016 at 1:28 pm

You never did need AI to beat a human at Go. Just a very fast computer, and some trivially simple software.
It is impossible to tell from the article to what extent this is smart AI, as claimed, or fast computing.

Reply
CdrJameson says:

March 15, 2016 at 9:11 am

This is a neural network, so it isn’t fast computing.
All the effort happens ahead of time training up the network.
Playing the game is a very small part of the computation going on here.

The main problems with this current system are that it relies on a massive corpus of human games to learn from and that it simulates ‘system 1’ intelligence (intuition).

The relying-on-human-example problem means that novel strategies will have a better chance to beat it (although it’s been playing against itself recently, so may have some of its own innovations). It also bodes ill for applying this to other domains, especially if there are biases in the training data – it’s very easy to accidentally train a system 1 AI to be sexist/racist/whatever if that is present in the example data.

The ‘system 1’ problem means that it can’t directly reveal how it works, ie. It can play Go well, but doesn’t have any neatly coded explicit ‘strategy’ that we could learn from. It’s a bit of a magic black-box (which is fair enough, our brains are too). We can pick the neural network apart but there’s no guarantee that would reveal anything generally useful.

What’s still confusing me is what’s different about this vs 1980s Neural Networks,
and I’m not sure I’m seeing anything yet. The real problem with Neural Nets is having some way of figuring out how complex a net you need to solve any given problem.
Too small and you’ll literally never get a well functioning system, and you need a better answer to ‘is my net too small?’ than ‘try it and see!’.

Personally I like it. As an AI person I never got on with the maths-heavy ‘system 2’ approaches that were used for Chess playing systems. I just can’t quite figure out why it’s news now.

Reply
PKS says:

March 17, 2016 at 12:53 am

As someone who plays go and has taught game theory, I see some misconceptions with the post and comments here. I watched the games and got a basic feel for how alpha go works so let me summarize:

First, what are the problems:

Issue 1: Go is big. It’s a perfect information game with a finite game tree, so in theory you should be able to backwards induce it. Unfortunately, the number of possible game endings (terminal nodes) is about 10 billion times the square of the number of the atoms of the universe, so a computer made of all the matter of the universe running until the end of time would not evaluate nearly any of the total number of end game states.

Solution for 1: Search only some of the tree since most of the branches are due to obviously bad moves.

This creates 2 issues:

Issue 2: What branches should be searched.

Issue 3: Given we don’t search all the branches or all the way down to the end game state, then we technically don’t know the value of the branch since some branch we didn’t check might have some a payoff that refutes the strategy that otherwise seemed optimal.

To solve these issues jointly Deepmind developed a “nested” neural net architecture. A neural net is essentially a nonparametric estimation procedure that approximate relationships between datapoints. In this case, the datapoints are different nodes of a Go game tree.

To solve issue 2, a neural net is developed that chooses a selected set of moves to investigate further.To solve Issue 3, a second neural net is developed to give AlphaGo a sense of what value to assign to a position. In a broad sense, this is supposed to mimic human go thought processes, the first neural net is “intuition” about promising moves, and the second is “reading” in which the moves are evaluated through simulation of likely moves in the future.

However, neural net development is very data intensive and slow, so the 80,000 games used as training data only act as a starter batch to give AlphaGo a very vague feel for what a reasonable play is. The real development happens when AlphaGo plays itself, for millions of games, as that is a the number required for it to make real progress in developing an accurate neural net. Most extraordinarily – the games used as the starter batch are merely strong Amateurs on an online go server – it has really no idea how to play like a professional since it has never seen a professional play. Moreover, while it didn’t learn from Fan Hui or Lee Sedol while playing them – their games were not allowed to affect the neural net estimation by Deepmind for I guess diagnostic reasons – even if they were they wouldn’t have done anything since it would have been so little data. In fact, of the hundreds of professional games each player has played, I’d suspect inputting them all into estimation procedure would have had little effect because the estimation process is so slow. Deepmind staff estimated it would take millions of Lee Sedol games for Alpha Go to develop anti-Lee Sedol tactics, and then it would apply them to every since it can’t distinguish who it’s playing.

Reply