Avoiding wireheading with value reinforcement learning

Tom Everitt*, Marcus Hutter

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    18 Citations (Scopus)

    Abstract

    How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading.

    Original languageEnglish
    Title of host publicationArtificial General Intelligence - 9th International Conference, AGI 2016, Proceedings
    EditorsBas Steunebrink, Pei Wang, Ben Goertzel
    PublisherSpringer Verlag
    Pages12-22
    Number of pages11
    ISBN (Print)9783319416489
    DOIs
    Publication statusPublished - 2016
    Event9th International Conference on Artificial General Intelligence, AGI 2016 - New York, United States
    Duration: 16 Jul 201619 Jul 2016

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume9782
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference9th International Conference on Artificial General Intelligence, AGI 2016
    Country/TerritoryUnited States
    CityNew York
    Period16/07/1619/07/16

    Fingerprint

    Dive into the research topics of 'Avoiding wireheading with value reinforcement learning'. Together they form a unique fingerprint.

    Cite this