TY - GEN
T1 - Learning to Select Views for Efficient Multi-View Understanding
AU - Hou, Yunzhong
AU - Gould, Stephen
AU - Zheng, Liang
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Multiple camera view (multi-view) setups have proven useful in many computer vision applications. However, the high computational cost associated with multiple views creates a significant challenge for end devices with limited computational resources. In modern CPU, pipelining breaks a longer job into steps and enables parallelism over sequential steps from multiple jobs. Inspired by this, we study selective view pipelining for efficient multi-view understanding, which breaks computation of multiple views into steps, and only computes the most helpful views/steps in a parallel manner for the best efficiency. To this end, we use reinforcement learning to learn a very light view selection module that analyzes the target object or scenario from initial views and selects the next-best-view for recognition or detection for pipeline computation. Experimental results on multi-view classification and detection tasks show that our approach achieves promising performance while using only 2 or 3 out of N available views, significantly reducing computational costs while maintaining parallelism over GPU through selective view pipelining11Code available at https://github.com/hou-yz/MVSelect.
AB - Multiple camera view (multi-view) setups have proven useful in many computer vision applications. However, the high computational cost associated with multiple views creates a significant challenge for end devices with limited computational resources. In modern CPU, pipelining breaks a longer job into steps and enables parallelism over sequential steps from multiple jobs. Inspired by this, we study selective view pipelining for efficient multi-view understanding, which breaks computation of multiple views into steps, and only computes the most helpful views/steps in a parallel manner for the best efficiency. To this end, we use reinforcement learning to learn a very light view selection module that analyzes the target object or scenario from initial views and selects the next-best-view for recognition or detection for pipeline computation. Experimental results on multi-view classification and detection tasks show that our approach achieves promising performance while using only 2 or 3 out of N available views, significantly reducing computational costs while maintaining parallelism over GPU through selective view pipelining11Code available at https://github.com/hou-yz/MVSelect.
UR - http://www.scopus.com/inward/record.url?scp=85207279868&partnerID=8YFLogxK
U2 - 10.1109/CVPR52733.2024.01903
DO - 10.1109/CVPR52733.2024.01903
M3 - Conference contribution
AN - SCOPUS:85207279868
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 20135
EP - 20144
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PB - IEEE Computer Society
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Y2 - 16 June 2024 through 22 June 2024
ER -