Connecting language and vision to actions **

Peter Anderson, Abhishek Das, Qi Wu

    Research output: Contribution to journalMeeting Abstract

    Abstract

    A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or embodied environment. To this end, recent advances at the intersection of language and vision have made incredible progress - from being able to generate natural language descriptions of images/videos, to answering questions about them, to even holding free-form conversations about visual content! However, while these agents can passively describe images or answer (a sequence of) questions about them, they cannot act in the world (what if I cannot answer a question from my current view, or I am asked to move or manipulate something?). Thus, the challenge now is to extend this progress in language and vision to embodied agents that take actions and actively interact with their visual environments. 2018 Association for Computational Linguistics

    Fingerprint

    Dive into the research topics of 'Connecting language and vision to actions **'. Together they form a unique fingerprint.

    Cite this