AlphaZero comes out: 8 hours to complete Go, Chess, Japan Chess

This article is produced by NetEase Smart Studio (public number smartman 163). Focus on AI and read the next big era!

[Netease smart news December 7 news] Recently, DeepMind team published the latest paper, proposed a new reinforcement learning algorithm AlphaZero, it is a kind of from scratch, through self-enhanced learning to achieve in a variety of tasks beyond human The new horizontal algorithm is called "general chess AI".

It is understood that the AlphaZero algorithm can beat the Li Shishi version of AlphaGo in 8 hours of training; then 4 hours of training to defeat the world's top chess program Stockfish; then 2 hours of training to defeat the world's top chess program Elmo. This is another new algorithm brought to us by the DeepMind team following the study of AlphaGo Zero. It is a "more generic version."

In addition, we have seen several differences between AlphaZero and AlphaGo Zero. First, AlphaGo Zero estimates and optimizes the probability of winning, assuming the outcome is a win/loss binary. AlphaZero takes into account the draw or other potential results and estimates and optimizes the results. Second, AlphaGo and AlphaGo Zero will change the board position for data enhancement, while AlphaZero will not. Third, AlphaZero only maintains a single neural network. This neural network is constantly updated rather than waiting for iterations. In AlphaZero, all games are repeated using the same hyper-parameters, so there is no need to adjust for a specific game.

According to statistics, AlphaGo is the first artificial intelligence program to defeat the human professional Go players and the first to defeat the world champion of Go. It was developed by the team led by Googleâ€™s DeepMind company, Dames Hasabis. . Its main working principle is "deep learning."

As early as March 2016, Alpha Go went on a go-to-play man-machine battle with Go world champion and professional nine-piece player Li Shishi, winning with a 4-1 total score; in late 2017 and early 2017, the program took the â€œmasterâ€ on the Chinese chess website. "Master" competed with dozens of Go players in China, Japan and South Korea for a quick match, and won 60 consecutive games; in May 2017, at the Wuzhen Weiqi Summit in China, it was ranked first in the world with World Go Championship. Ke Jie won the match with a total score of 3:0. The chess community recognizes that the Alpha Go game has exceeded the level of the human professional Go game. In the GoRatings website, the worldâ€™s professional Go game rankings have exceeded the rank of the first human player Ke Jie.

On May 27, 2017, after the human-computer war between Ke Jie and Alpha Go, the Alpha Go team announced that Alpha Go will no longer participate in the Go game.

On October 18, 2017, the DeepMind team announced the strongest version of AlphaGo, code-named AlphaGo Zero.

This time, in just two months, the strongest version of AlphaGo Zero has evolved into AlphaZero.

For more information, please click the paper address: https://arxiv.org/pdf/1712.01815.pdf

Pay attention to NetEase smart public number (smartman163), obtain the latest report of artificial intelligence industry.