Self-modification of policy and utility function in rational agents

Tom Everitt*, Daniel Filan, Mayank Daswani, Marcus Hutter

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    10 Citations (Scopus)

    Abstract

    Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify – for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby ‘escaping’ the control of their creators. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.

    Original languageEnglish
    Title of host publicationArtificial General Intelligence - 9th International Conference, AGI 2016, Proceedings
    EditorsBas Steunebrink, Pei Wang, Ben Goertzel
    PublisherSpringer Verlag
    Pages1-11
    Number of pages11
    ISBN (Print)9783319416489
    DOIs
    Publication statusPublished - 2016
    Event9th International Conference on Artificial General Intelligence, AGI 2016 - New York, United States
    Duration: 16 Jul 201619 Jul 2016

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume9782
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference9th International Conference on Artificial General Intelligence, AGI 2016
    Country/TerritoryUnited States
    CityNew York
    Period16/07/1619/07/16

    Fingerprint

    Dive into the research topics of 'Self-modification of policy and utility function in rational agents'. Together they form a unique fingerprint.

    Cite this