Specification Gaming: The Flip Side Of AI Ingenuity

Újra kiadta Platón

Követő: 0

At first sight, these kinds of examples may seem amusing but less interesting, and irrelevant to deploying agents in the real world, where there are no simulator bugs. However, the underlying problem isn’t the bug itself but a failure of abstraction that can be exploited by the agent. In the example above, the robot’s task was misspecified because of incorrect assumptions about simulator physics. Analogously, a real-world traffic optimisation task might be misspecified by incorrectly assuming that the traffic routing infrastructure does not have software bugs or security vulnerabilities that a sufficiently clever agent could discover. Such assumptions need not be made explicitly – more likely, they are details that simply never occurred to the designer. And, as tasks grow too complex to consider every detail, researchers are more likely to introduce incorrect assumptions during specification design. This poses the question: is it possible to design agent architectures that correct for such false assumptions instead of gaming them?

One assumption commonly made in task specification is that the task specification cannot be affected by the agent’s actions. This is true for an agent running in a sandboxed simulator, but not for an agent acting in the real world. Any task specification has a physical manifestation: a reward function stored on a computer, or preferences stored in the head of a human. An agent deployed in the real world can potentially manipulate these representations of the objective, creating a jutalom manipulálása problem. For our hypothetical traffic optimisation system, there is no clear distinction between satisfying the user’s preferences (e.g. by giving useful directions), and a felhasználók befolyásolása könnyebben kielégíthető preferenciái legyenek (pl. úgy, hogy rábírják őket, hogy könnyebben elérhető úti célokat válasszanak). Az előbbi kielégíti a célt, míg az utóbbi manipulálja a cél reprezentációját a világban (a felhasználói preferenciákat), és mindkettő magas jutalmat eredményez az AI-rendszer számára. Egy másik, szélsőségesebb példaként egy nagyon fejlett AI-rendszer eltérítheti a számítógépet, amelyen fut, és manuálisan magas értékre állítja a jutalomjelet.

Forrás: https://deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity

Időbélyeg: April 21, 2020