DeepMind Says New Multi-Game AI Is a Step Toward More General Intelligence

DeepMind Says New Multi-Game AI Is a Step Toward More General Intelligence

Source Node: 2390985

AI has mastered some of the most complex games known to man, but models are generally tailored to solve specific kinds of challenges. A new DeepMind algorithm that can tackle a much wider variety of games could be a step towards more general AI, its creators say.

Using games as a benchmark for AI has a long pedigree. When IBM’s Deep Blue algorithm beat chess world champion Garry Kasparov in 1997, it was hailed as a milestone for the field. Similarly, when DeepMind’s AlphaGo defeated one of the world’s top Go players, Lee Sedol, in 2016, it led to a flurry of excitement about AI’s potential.

DeepMind built on this success with AlphaZero, a model that mastered a wide variety of games, including chess and shogi. But as impressive as this was, AlphaZero only worked with perfect information games where every detail of the game, other than the opponent’s intentions, is visible to both players. This includes games like Go and chess where both players can always see all the pieces on the board.

In contrast, imperfect information games involve some details being hidden from the other player. Poker is a classic example because players can’t see what hands their opponents are holding. There are now models that can beat professionals at these kinds of games too, but they use an entirely different approach than algorithms like AlphaZero.

Now, researchers at DeepMind have combined elements of both approaches to create a model that can beat humans at chess, Go, and poker. The team claims the breakthrough could accelerate efforts to create more general AI algorithms that can learn to solve a wide variety of tasks.

Researchers building AI to play perfect information games have generally relied on an approach known as tree search. This explores a multitude of ways the game could progress from its current state, with different branches mapping out potential sequences of moves. AlphaGo combined tree search with a machine learning technique in which the model refines its skills by playing itself repeatedly and learning from its mistakes.

When it comes to imperfect information games, researchers tend to instead rely on game theory, using mathematical models to map out the most rational solutions to strategic problems. Game theory is used extensively in economics to understand how people make choices in different situations, many of which involve imperfect information.

In 2016, an AI called DeepStack beat human professionals at no-limit poker, but the model was highly specialized for that particular game. Much of the DeepStack team now works at DeepMind, however, and they’ve combined the techniques they used to build DeepStack with those used in AlphaZero.

The new algorithm, called Student of Games, uses a combination of tree search, self-play, and game-theory to tackle both perfect and imperfect information games. In a paper in Science, the researchers report that the algorithm beat the best openly available poker playing AI, Slumbot, and could also play Go and chess at the level of a human professional, though it couldn’t match specialized algorithms like AlphaZero.

But being a jack-of-all-trades rather than a master of one is arguably a bigger prize in AI research. While deep learning can often achieve superhuman performance on specific tasks, developing more general forms of AI that can be applied to a wide range of problems is trickier. The researchers say a model that can tackle both perfect and imperfect information games is “an important step toward truly general algorithms for arbitrary environments.”

It’s important not to extrapolate too much from the results, Michael Rovatsos from the University of Edinburgh, UK, told New Scientist. The AI was still operating within the simple and controlled environment of a game, where the number of possible actions is limited and the rules are clearly defined. That’s a far cry from the messy realities of the real world.

But even if this is a baby step, being able to combine the leading approaches to two very different kinds of game in a single model is a significant achievement. And one that could certainly be a blueprint for more capable and general models in the future.

Image Credit: Hassan Pasha / Unsplash

Time Stamp:

More from Singularity Hub