Deep Reinforcement Learning

Republished By Plato

Followers: 0

We have subsequently improved the DQN algorithm in many ways: further stabilising the learning dynamics; prioritising the replayed experiences; normalising, aggregating and re-scaling the outputs. Combining several of these improvements together led to a 300% improvement in mean score across Atari games; human-level performance has now been achieved in almost all of the Atari games. We can even train a single neural network to learn about multiple Atari games. We have also built a massively distributed deep RL system, known as Gorila, that utilises the Google Cloud platform to speed up training time by an order of magnitude; this system has been applied to recommender systems within Google.

However, deep Q-networks are only one way to solve the deep RL problem. We recently introduced an even more practical and effective method based on asynchronous RL. This approach exploits the multithreading capabilities of standard CPUs. The idea is to execute many instances of our agent in parallel, but using a shared model. This provides a viable alternative to experience replay, since parallelisation also diversifies and decorrelates the data. Our asynchronous actor-critic algorithm, A3C, combines a deep Q-network with a deep policy network for selecting actions. It achieves state-of-the-art results, using a fraction of the training time of DQN and a fraction of the resource consumption of Gorila. By building novel approaches to intrinsic motivation and temporally abstract planning, we have also achieved breakthrough results in the most notoriously challenging Atari games, such as Montezuma’s Revenge.

While Atari games demonstrate a wide degree of diversity, they are limited to 2D sprite-based video games. We have recently introduced Labyrinth: a challenging suite of 3D navigation and puzzle-solving environments. Again, the agent only observes pixel-based inputs from its immediate field-of-view, and must figure out the map to discover and exploit rewards.

Source: https://deepmind.com/blog/article/deep-reinforcement-learning

Time Stamp: June 16, 2016

Time Stamp: Apr 11, 2018

Deep Reinforcement Learning

Republished By Plato

More from Deep Mind - Latest Post

Open-sourcing DeepMind Lab

Towards understanding glasses with graph neural networks

MuZero: Mastering Go, chess, shogi and Atari without rules

Predicting eye disease with Moorfields Eye Hospital

Prefrontal cortex as a meta-reinforcement learning system

Episode 4: AI, Robot

Learning through human feedback

Enhancing patient safety at Taunton and Somerset NHS Foundation Trust

Fast reinforcement learning through the composition of behaviours

About Us

Vertical Search & Ai

Platform

Stay Connected

Account