TY - JOUR
T1 - Improving Driver Gaze Prediction with Reinforced Attention
AU - Lv, Kai
AU - Sheng, Hao
AU - Xiong, Zhang
AU - Li, Wei
AU - Zheng, Liang
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - We consider the task of driver gaze prediction: estimating where the location of the focus of a driver should be, based on a raw video of the outside environment. In practice, we output a probability map that gives the normalized probability of each point in a given scene being the object of the driver attention. Most existing methods (i.e., Coarse-to-Fine and Multi-branch) take an image or a video as input and directly output the fixation map. While successful, these methods can often produce highly scattered predictions, rendering them unreliable for real-world usage. Motivated by this observation, we propose the reinforced attention (RA) model as a regulatory mechanism to increase prediction density. Our method is built directly on top of existing methods, making it complementary to current approaches. Specifically, we first use Multi-branch to obtain an initial fixation map. Then, RA is trained using deep reinforcement learning to learn a location prediction policy, producing a reinforced attention. Finally, in order to obtain the final gaze prediction result, we combine the fixation map and the reinforced attention by a mask-guided multiplication. Experimental results show that our framework improves the accuracy of gaze prediction, and provides state-of-the-art performance on the DR(eye)VE dataset.
AB - We consider the task of driver gaze prediction: estimating where the location of the focus of a driver should be, based on a raw video of the outside environment. In practice, we output a probability map that gives the normalized probability of each point in a given scene being the object of the driver attention. Most existing methods (i.e., Coarse-to-Fine and Multi-branch) take an image or a video as input and directly output the fixation map. While successful, these methods can often produce highly scattered predictions, rendering them unreliable for real-world usage. Motivated by this observation, we propose the reinforced attention (RA) model as a regulatory mechanism to increase prediction density. Our method is built directly on top of existing methods, making it complementary to current approaches. Specifically, we first use Multi-branch to obtain an initial fixation map. Then, RA is trained using deep reinforcement learning to learn a location prediction policy, producing a reinforced attention. Finally, in order to obtain the final gaze prediction result, we combine the fixation map and the reinforced attention by a mask-guided multiplication. Experimental results show that our framework improves the accuracy of gaze prediction, and provides state-of-the-art performance on the DR(eye)VE dataset.
KW - Gaze prediction
KW - deep learning
KW - driver attention
KW - reinforcement learning
KW - video processing
UR - http://www.scopus.com/inward/record.url?scp=85097174963&partnerID=8YFLogxK
U2 - 10.1109/TMM.2020.3038311
DO - 10.1109/TMM.2020.3038311
M3 - Article
SN - 1520-9210
VL - 23
SP - 4198
EP - 4207
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -