TY - JOUR
T1 - Intelligence and Unambitiousness Using Algorithmic Information Theory
AU - Cohen, Michael
AU - Vellambi, Badri N.
AU - Hutter, Marcus
PY - 2021
Y1 - 2021
N2 - Algorithmic Information Theory has inspired intractable constructions of general intelligence (AGI), and undiscovered tractable approximations are likely feasible. Reinforcement Learning (RL), the dominant paradigm by which an agent might learn to solve arbitrary solvable problems, gives an agent a dangerous incentive: to gain arbitrary power in order to intervene in the provision of their own reward. We review the arguments that generally intelligent algorithmic-information-theoretic reinforcement learners such as Hutter's [2] AIXI would seek arbitrary power, including over us. Then, using an information-theoretic exploration schedule, and a setup inspired by causal influence theory, we present a variant of AIXI which learns to not seek arbitrary power; we call it unambitious. We show that our agent learns to accrue reward at least as well as a human mentor, while relying on that mentor with diminishing probability. And given a formal assumption that we probe empirically, we show that eventually, the agent's world-model incorporates the following true fact: intervening in the outside world will have no effect on reward acquisition; hence, it has no incentive to shape the outside world.
AB - Algorithmic Information Theory has inspired intractable constructions of general intelligence (AGI), and undiscovered tractable approximations are likely feasible. Reinforcement Learning (RL), the dominant paradigm by which an agent might learn to solve arbitrary solvable problems, gives an agent a dangerous incentive: to gain arbitrary power in order to intervene in the provision of their own reward. We review the arguments that generally intelligent algorithmic-information-theoretic reinforcement learners such as Hutter's [2] AIXI would seek arbitrary power, including over us. Then, using an information-theoretic exploration schedule, and a setup inspired by causal influence theory, we present a variant of AIXI which learns to not seek arbitrary power; we call it unambitious. We show that our agent learns to accrue reward at least as well as a human mentor, while relying on that mentor with diminishing probability. And given a formal assumption that we probe empirically, we show that eventually, the agent's world-model incorporates the following true fact: intervening in the outside world will have no effect on reward acquisition; hence, it has no incentive to shape the outside world.
U2 - 10.1109/JSAIT.2021.3073844
DO - 10.1109/JSAIT.2021.3073844
M3 - Article
VL - 2
SP - 678
EP - 690
JO - IEEE Journal on Selected Areas in Information Theory
JF - IEEE Journal on Selected Areas in Information Theory
IS - 2
ER -