Abstract
We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches. We propose a new algorithm that exploits the causal feedback and prove a bound on its simple regret that is strictly better (in all quantities) than algorithms that do not use the additional causal information.
Original language | English |
---|---|
Pages (from-to) | 1189-1197 |
Number of pages | 9 |
Journal | Advances in Neural Information Processing Systems |
Publication status | Published - 2016 |
Event | 30th Annual Conference on Neural Information Processing Systems, NIPS 2016 - Barcelona, Spain Duration: 5 Dec 2016 → 10 Dec 2016 |