Specification Gaming: The Flip Side Of AI Ingenuity

Republished By Plato

Followers: 0

At first sight, these kinds of examples may seem amusing but less interesting, and irrelevant to deploying agents in the real world, where there are no simulator bugs. However, the underlying problem isn’t the bug itself but a failure of abstraction that can be exploited by the agent. In the example above, the robot’s task was misspecified because of incorrect assumptions about simulator physics. Analogously, a real-world traffic optimisation task might be misspecified by incorrectly assuming that the traffic routing infrastructure does not have software bugs or security vulnerabilities that a sufficiently clever agent could discover. Such assumptions need not be made explicitly – more likely, they are details that simply never occurred to the designer. And, as tasks grow too complex to consider every detail, researchers are more likely to introduce incorrect assumptions during specification design. This poses the question: is it possible to design agent architectures that correct for such false assumptions instead of gaming them?

One assumption commonly made in task specification is that the task specification cannot be affected by the agent’s actions. This is true for an agent running in a sandboxed simulator, but not for an agent acting in the real world. Any task specification has a physical manifestation: a reward function stored on a computer, or preferences stored in the head of a human. An agent deployed in the real world can potentially manipulate these representations of the objective, creating a reward tampering problem. For our hypothetical traffic optimisation system, there is no clear distinction between satisfying the user’s preferences (e.g. by giving useful directions), and influencing users to have preferences that are easier to satisfy (e.g. by nudging them to choose destinations that are easier to reach). The former satisfies the objective, while the latter manipulates the representation of the objective in the world (the user preferences), and both result in high reward for the AI system. As another, more extreme example, a very advanced AI system could hijack the computer on which it runs, manually setting its reward signal to a high value.

Source: https://deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity

Time Stamp: April 21, 2020

AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

Source Cluster:

Deep Mind - Latest Post

Source Node: 748944

Time Stamp: Oct 30, 2019

AlphaGo’s next move

Source Cluster:

Source Cluster:

Deep Mind - Latest Post

Source Node: 794809

Time Stamp: Jul 9, 2017

Specification gaming: the flip side of AI ingenuity

Republished By Plato

More from Deep Mind - Latest Post

AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

AlphaGo’s next move

Bringing the best of mobile technology to Imperial College Healthcare NHS Trust

Specifying AI safety problems in simple environments

Applying machine learning to radiotherapy planning for head & neck cancer

Measuring abstract reasoning in neural networks

Applying for technical roles

Bringing Streams to Yeovil District Hospital NHS Foundation Trust

DeepMind and Blizzard open StarCraft II as an AI research environment

Producing flexible behaviours in simulated environments

About Us

Vertical Search & Ai

Platform

Stay Connected

Account