AI Plays Games


Go player Ke Jie competes against Google’s AlphaGoDepositphotos enhanced by CogWorld

Artificial Intelligence (AI) was all the rage in the 1980s. Specifically, companies invested heavily to build expert systems – AI applications that captured the knowledge of acknowledged human experts and made it available to solve narrowly defined types of problems. Thus, expert systems were created to configure complex computer systems and to detect likely credit card fraud. This earlier round of AI was triggered by a series of successful academic expert applications created at Stanford University. One was called Dendral. Dendral analyzed mass spectra data and identified organic molecules – something that, previously, only a few chemists could do. Another expert systems was called Mycin, and it analyzed potential of meningitis infections. In a series of tests, it was shown that Mycin could analyze meningitis as well as human meningitis experts, and it even did slightly better, since it never overlooked possible drug incompatibility issues.

The expert systems developed in the Eighties all followed the general approach followed by Dendral and Mycin.  Human experts were interviewed and rules for analysis and design were defined. An expert system took anywhere from a few hundred to many thousands of rules to enable expert performance. When an error was identified, the human analysts and the experts had to carefully review all the rules and modify them until the system performed as desired.  Ultimately, the expert systems approach was dropped because the costs of developing and maintaining the expert systems were too great. At the expert level, knowledge evolved so rapidly, that, in effect, one was constantly involved in revising the systems.

The point of reviewing all this, however, is to point out that it was the successful early expert systems, created in the Seventies, that convinced companies to learn about AI and to launch the many AI initiatives that dominated business computer groups in the early Eighties.

Today we are witnessing a renewed interest in AI. This time the effort has been triggered by the news coverage given to several successful game playing applications developed in the recent past.

IBM’s Watson Plays Jeopardy!

In the 1990s IBM created Deep Blue, an AI application specifically designed to play chess. It was the latest in a series of chess-playing programs that IBM developed, and in 1997, during its second challenge match with Garry Kasparov, the world chess grandmaster, Deep Blue won the match. (Deep Blue won two games, Kasparov won one, and three games were drawn.) Those who studied the software architecture of Deep Blue know that it depended on “brute force,” a term computer people use to refer to the fact that the system relied more on its ability to search millions of examples and evaluate millions of possibilities in a few minutes than on its ability to reason. Specifically, Deep Blue used an approach that looked forward several moves for each reasonable “next move” and then chose the move that would yield the highest number of points. The fact that Deep Blue defeated a human grandmaster was impressive, but it didn’t immediately suggest any other applications, since the application was highly specialized to evaluate a chessboard and select the next best chess move.

As the new millennium began, IBM was looking around for another challenging problem, and wanted to find one with more applications than chess. IBM also wanted to explore new techniques that were being developed in AI labs. In 2004, IBM began to consider developing an application that could play Jeopardy!.  Jeopardy! is a very popular TV game that draws large viewing audiences and offers some real challenges for a computer. In Jeopardy! contestants are given “answers” and asked to come up with the “question” that would lead to such an answer. The “questions” and “answers” used on Jeopardy! are drawn from a broad base of general knowledge on topics such as history, literature, science, politics, geography, film, art, music, and pop culture. Moreover, the game format requires that the contestants be able to consider the “answers” provided, which are often subtle, ironic, or contain riddles, and generate responses within about 3 seconds.

In essence, a Jeopardy!-playing application posed two different problems: understanding natural language well enough to be able to identify the right “answer” and then searching a huge database of general information for a “question” that fits the “answer.” Searching a huge database quickly was a more or less physical problem, but “hearing” and then “understanding” spoken English, and finally determining which of several possible answers was the right match for the question being asked, were serious cognitive problems.

In 2007, IBM established a team of 15 people, and gave them 5 years to solve it. The team in turn recruited a large staff of consultants from leading AI labs in universities and began. The first version was ready in 2008 and in February of 2010, the software application Watson proved it could beat two of the best-known former Jeopardy! winners, Brad Rutter and Ken Jennings, in a widely watched TV match.

The key to Watson’s analytic functionality is DeepQA (Deep Question Analytics), a massively parallel probabilistic architecture that uses and combines more than 100 different techniques—a mixture of knowledge and neural net techniques—to analyze natural language, identify sources, find and generate hypotheses, and then evaluate evidence and merge and rank hypotheses. In essence, DeepQA can perform thousands of simultaneous tasks in seconds to provide answers to questions. Given a specific query, Watson might decompose it and seek answers by activating hundreds or thousands of threads running in parallel.

Watson maintained all its data in memory to help provide the speed it needed for Jeopardy! It had 16 terabytes of RAM. It used 90 clustered IBM Power 750 servers with 32 cores running at 3.55 GHz. The entire system runs on Linux and operates at over 80 teraflops (i.e., 80 trillion operations per second).

To sum up: IBM demonstrated that AI-based natural language analysis and generation had reached the point where a system like Watson could understand open-ended questions and respond in real time. Watson examined Jeopardy! “answers,” defined what information was needed, accessed vast databases to find the needed information.  It then generated an English response in under 3 seconds. It did it faster and better than two former human Jeopardy! winners and easily won the match.

Unlike Deep Blue, which was more or less abandoned once it had shown it could win chess matches, Watson is a more generic type of application. It includes elements that allow it to listen to and respond in English. Moreover, it is capable of examining a huge database to come up with responses to questions. Today, the latest version of Watson functions as a general purpose AI tool (some would prefer to call it an AI platform) and is being used by hundreds of developers to create new AI applications.

Fukoku Mutual Life Insurance Company in Tokyo (Japan), for example, worked with IBM’s Watson team to develop an application to calculate payments for medical treatments. The system considers hospital stays, medical histories, and surgical procedures. If necessary the application has the ability to “read” unstructured text notes, and “scan” medical certificates and other photographic or visual documents to gather needed data. Development of the application cost 200 million yen. It is estimated that it will cost about 15 million yen a year to maintain. It will displace approximately 34 employees, saving the company about 140 million yen each year, and thus it will pay for itself in 2 years. The new business process using the Watson application will drastically reduce the time required to generate payments, and the company estimates that the new approach will increase its productivity by 30%.

Google’s AlphaGo

While IBM was working on its Jeopardy!-playing application, Google acquired its own AI group and that group decided to illustrate the power of recent AI develop- ments with its own game-playing system. Go is an ancient board game that is played on a 19×19 matrix. The players alternate placing black or white “stones” on the points created by the intersecting lines. The goal of the game is to end up controlling the most space on the board. Play is defined by a very precise set of rules.

When IBM’s Deep Blue beat chess grandmaster Garry Kasparov, in 1997, AI experts immediately began to think about how they could build a computer that could play and defeat a human Go player, since Go was the only game of strategy that everyone acknowledged was more difficult than chess. This can be exemplified by noting that the first move of a chess game offers 20 possibilities, whereas the first move in a Go game offers the first player a chance of placing the stone in any one of 361 intersections. The second player then responds by placing a stone in any one of the 360 remaining positions. A typical chess game lasts around 80 moves,  while Go games can last for 150 turns. Both games have explicit moves and rules that theoretically would allow an analyst to create a branching diagram to explore all logical possibilities. In both cases, however, the combinations are so vast that logical analysis is impossible. The possible game states in either game are greater than the number of atoms in the universe. (The search space for chess is generally said to be 1047, whereas the search space for Go is generally held to be 10170.)

In October 2015 AlphaGo, a program developed by DeepMind (a subsidiary of Google), defeated Fan Hui, the European Go champion, five times in a five-game Go match. In March 2016 an improved version of AlphaGo played a tournament with the leading Go master in the world, Lee Sedol, in Seoul. AlphaGo won four games in a five-game tournament.

So, how does AlphaGo work? The first thing to say is that the core of AlphaGo was not developedas a software package to play Go. The basic neural net architecture used in AlphaGo was initially developed to play Atari software games. The Atari-playing program was designed to “look” at computer screens (matrices of pixels) and respond to them. When DeepMind subsequently decided to tackle the Go- playing problem, it simply re-purposed the Atari software package. The input that AlphaGo uses is a detailed 19 × 19 matrix of a Go board with all the stones that have been placed on it. The key point, however, is that the underlying AlphaGo platform is based on a generic software package designed to learn to play games; it’s not a specially developed Go-playing program.

AlphaGo largely depends on two deep neural nets. A neural network is an AI approach that depends on using various algorithms to analyze statistical patterns and determine which patterns are most likely to lead to a desired result.

As already noted, the basic unit being evaluated by AlphaGo is the entire Go board. Input for the neural network was a graphic representation of the entire 19 × 19 Go board with all of the black and white stones in place. In effect, AlphaGo “looks” at the actual board and state of play, and then uses that complete pattern as one unit. Winning games are boards with hundreds of stones in place. The board that preceded the winning board was a board with all the final stones, save one, and so forth. A few years ago no computer would have been able to handle the amount of data that AlphaGo was manipulating to “consider” board states. (Much of IBM’s Watson’s usefulness is its ability to ask questions and provide answers in human language. This natural language facility isn’t really a part of the core ‘thought processes’ going on in Watson, but it adds a huge amount of utility to the overall application. In a similar way, the ability of AlphaGo to use images of actual Go boards with their pieces in place adds an immense amount of utility to AlphaGo when it’s presented as a Go-playing application.)

Note also that AlphaGo examined 100,000s of Go games as it learned to identify likely next moves or board states that lead to a win. A few decades ago, it would have been impossible to obtain detailed examples of good Go games. The games played in major tourneys have always been recorded, but most Go games were not documented. All that changed with the invention of the Internet and the Web. Today many Go players play with Go software in the Cloud, and their moves are automatically captured. Similarly, many players exchange moves online, and many sites document games. Just as business and government organizations now have huge databases of emails, reports on website responses that they can mine for patterns, today’s Go applications are able to draw on huge databases of Go games, and the team that developed AlphaGo was able to draw on these databases when they initially trained AlphaGo using actual examples (i.e., supervised learning).

One key to understanding AlphaGo, and other deep neural network–based applications, is to understand the role of reinforcement learning. When we developed expert systems in the late 1980s, and a system failed to make a prediction correctly according to a human expert, the developers and the human expert spent days or even weeks poring over the hundreds of rules in the systems to see where the system went wrong. Then rules were changed and tests were run to see if specific rule changes would solve the problem. Making even a small change in a large expert system was a very labor-intensive and time-consuming job. AlphaGo, once it understood what a win meant, was able to play with a copy of its self and learn from every game it won. At the speed AlphaGo works it can play a complete game with a copy of itself in a matter of a seconds.

As already mentioned, AlphaGo defeated the leading European Go master in October 2015. In March 2016 it played the world Go champion. Predictably, the world Go champion studied AlphaGo’s October games to learn how AlphaGo plays. Unfortunately for him, AlphaGo had played millions of additional games—playing against a version of itself—since October, and significantly increased its ability to judge board states that lead to victory. Unlike the expert system development team that was forced to figure out how their system failed and then make a specific improvement the AlphaGo team has simply put AlphaGo in learning mode, and then set it to playing games with a version of itself. Each time AlphaGo won it adjusted the connection weights of its network to develop better approximations of the patterns that lead to victory. (Every so often the version of AlphaGo that it was playing against would be updated so it was as strong as the winning version of AlphaGo. That would make subsequent games more challenging for AlphaGo and make the progress even more rapid.) AlphaGo is capable of playing a million Go games a day with itself when in Reinforcement Learning mode.

As impressive as AlphaGo’s October victory over Fan Hui was, it paled by comparison with AlphaGo’s win over the Go champion Lee Sedol in March of 2016. Fan Hui, the European Go Champion, while a very good player, was only ranked a 2-dan professional (he was ranked the 633rd best professional Go player in the world), while Lee was ranked a 9-dan professional and widely considered the strongest active player in the world. Experts, after examining the games that AlphaGo played against Fan Hui, were confident that Lee Sedol could easily defeat AlphaGo. (They informally ranked AlphaGo a 5-dan player.) In fact, when the match with Lee Sedol took place (4 months after the match with Fan Hui) everyone was amazed at how much better AlphaGo was. What the professional Go players failed to realize was that in the course of 4 months AlphaGo had played millions of games with itself, constantly improving its play. It was as if a human expert had managed to accumulate several additional lifetimes of experience between the October and the March matches. Lee Sedol, after he lost the second game, said that he was in shock and impressed that AlphaGo had played a near perfect game.

AlphaGo was designed to maximize the probability that it would win the game. Thus, if AlphaGo has to choose between a scenario where it will win by 20 points with an 80% probability and another where it will win by 2 points with 99% probability it will choose the second. This explains the combination of AlphaGo’s very aggressive middlegame play, but its rather conservative play during the endgame. It may also explain the difficulties that Lee Sedol seemed to have when he reached the endgame and found many of the moves he wanted to make were already precluded.

To beat Lee Sedol, AlphaGo used 1920 processors and further, 280 GPUs— specialized chips capable of performing simple calculations in staggering quantities.

In spring 2017 AlphaGo was at it again, playing Chinese Grandmaster Ke Jie, and once again winning. The AlphaGo team announced following that victory that their program would “retire” and that Google would focus on working on more pressing human problems. Their work on helping clinicians diagnose patient problems faster, for example, is getting a lot of attention.

What was impressive about AlphaGo’s games with Ke Jie was not the wins, but the buzz around the innovations that AlphaGo introduced into its play. We have all become accustomed to the idea that AI systems can acquire vast amounts of knowledge and use that knowledge to solve problems. Many people, however, still imagine that the computer is doing something like a rapid search of a dictionary, looking up information as it is needed. In fact, AlphaGo learned to play Go by playing human players. Then it improved its skills by playing millions of games against itself. In the process AlphaGo developed new insights into what worked and what didn’t work. AlphaGo has now begun to develop approaches—sequences of moves—that it uses over and over again is similar situations. Students of Go have noticed these characteristic sequences of moves, given them names, and are now beginning to study and copy them.

One of the sequences is being referred to as the “early 3-3 invasion.” (Roughly, this refers to a way to capture a corner of the board by playing around the point that is three spaces in from the two sides of a corner.) Corner play has been extensively studied by Go masters and—just as openings have been studied and catalogued in chess play—experts tend to agree on what corner play works well and what is to be avoided. Thus grandmasters were shocked when AlphaGo introduced a new approach to corner play—a slight variation on an approach that was universally thought to be ineffective—and proceeded to use it several times, proving that it was powerful and useful. Indeed, following AlphaGo’s latest round of games Go masters are carefully studying a number of different, new move sequences that AlphaGo has introduced. Significantly, in games just after his loss to AlphaGo, Chinese Grandmaster Ke Jie started using the early 3-3 invasion sequence in his own games.

All this may seem trivial stuff, but the bottom line is AlphaGo introduced serious innovations in its Go play. It isn’t just doing what human grandmasters have been doing; it’s going beyond them and introducing new ways of playing Go.

In essence, AlphaGo is an innovative player! What this means for the rest of us is really important. It means that when Google develops a patient-diagnostic assistant, and after that assistant has studied the data on thousands or millions of patients it will begin to suggest insights that are beyond or better than those currently achieved by human doctors.

The deep learning neural network technology that underlies today’s newest AI systems is considerably more powerful than the kinds of AI technologies we have used in the recent past. It can learn and it can generalize, try variations, and identify the variations that are even more powerful than those it was already using. These systems promise to not only automate human performance, but to automate innovation. This is both exciting and challenging. It suggests that organizations that move quickly and introduce these systems are going to be well placed to gain insights that will give them serious competitive advantages over their more staid competitors.

The Impact of Winning Games

Since AlphaGo succeeded in winning GO against human competitors the emphasis has switched from game playing to more practical applications.  Google, for example, cut a deal with the UK health service and now has access to their medical records, which should result in some powerful diagnostic assistants in the near future and Watson is being widely used for a variety of commercial applications. A new generation of AI has proven itself and is moving from playing games to creating practical applications.

The difference between the AI of the Eighties and the AI of today is significant.  In the Eighties it was hard to develop expert systems and they were difficult to maintain. Today’s neural network-based applications rely on the examination of massive databases to learn by themselves, and can thus constantly improve themselves. Moreover, where AI in the Eighties was narrowly focused on applications that performed like human experts in narrow domains, today’s AI includes not only knowledge-based applications, but intelligent natural language and visual applications. 

Consider the currently popular automated automobile. A variety of companies are working on the problem and we expect automated cars to become available in this decade. An automated car must begin by scanning the environment, using a visual system that lets it detect roadways, other cars, and people. It must rely on GPS and maps to plot routes.  It must use knowledge of traffic rules to execute its moves in a rapidly changing environment. Most automated cars will probably use natural languages to communicate with passengers to obtain directions and to provide feedback. Some of the specific applications will depend on neural networks. Others will probably depend on knowledge rules.  In all, an automated car will employ a variety of different AI applications all working together – just as AlphaGo used a variety of different neural nets to perform its various tasks. The game playing AI applications of the last few years have provided a sound foundation for the next generation of commercial AI applications.


Bibliography
:

Harmon, Paul, and David King, Expert Systems: Artificial Intelligence in Business, Wiley, 1985.  For a good overview of the interest in AI in the 1980s.

Hsu, Feng-hsiung, Behind Deep Blue, Princeton University Press, 2002.  For a good review of IBM’s chess playing application and its victories.

Hall, Curt, “How Smart Is Watson, and What Is Its Significance to BI and DSS?” Advisor, Cutter Consortium, March 1, 2011.  For a review of IBM’s Watson.

Silver, David et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, Vol. 529, Issue 7587, pp. 484–489, January 27, 2016.  A technical review of AlphaGo.

Mnih, Volodymyr et al., “Playing Atari with Deep Reinforcement Learning”, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS, 2010).  A detailed description of the Go innovations that AlphaGo has introduced is available at http://deepmind.com/blog/innovations-alphago.


Paul Harmon is the executive editor of Business Process Trends (www.bptrends.com) and a Senior Consultant at Cutter Consortium providing advice on AI developments.  He is the author of Expert Systems: AI for Business and Business Process Change, which is now in its 4thedition.

© Copyright 2019 by Paul Harmon. All Rights Reserved.

Cludo Reports

admin

leave a comment

Create Account



Log In Your Account