TY - JOUR
T1 - Quantum reinforcement learning
AU - Dong, Daoyi
AU - Chen, Chunlin
AU - Li, Hanxiong
AU - Tarn, Txyh Jong
PY - 2008
Y1 - 2008
N2 - The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.
AB - The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.
KW - Collapse
KW - Grover iteration
KW - Probability amplitude
KW - Quantum reinforcement learning (QRL)
KW - State superposition
UR - http://www.scopus.com/inward/record.url?scp=49049104480&partnerID=8YFLogxK
U2 - 10.1109/TSMCB.2008.925743
DO - 10.1109/TSMCB.2008.925743
M3 - Article
C2 - 18784007
AN - SCOPUS:49049104480
SN - 1083-4419
VL - 38
SP - 1207
EP - 1220
JO - IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
JF - IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
IS - 5
ER -