A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents. To this end, recent advances at the intersection of vision and language have made incredible progress - from being able to generate natural language descriptions of images & videos, to answering questions about them, to even holding free-form conversations about visual content! The research on vision-language has attracted a lot of researchers across different communities, such as computer vision, natural language processing and machine learning.
This workshop propose to gather these researchers to form a new vision-language community and attract more people on this topic. In the workshop, we will invite several researchers from this area to present their most recent works. The workshop will be ended with an open panel discussion.
The goal of this workshop is to provide a comprehensive yet accessible overview of existing work and to reduce the entry barrier for new researchers. And we aim to invite speakers from this area to present their latest works and propose new challenges. Overall, the topics we will cover in this workshop are as following:
- visual captioning, dialogue, and questionanswering
- sequence learning towards bridging vision and language
- novel tasks which combine language and vision
- understanding the relationship between language and vision in humans
- language as a mechanism to structure and reason about visual perception
- language as a learning bias to aid vision in both machines and humans
- dialogue as means of sharing knowledge about visual perception
- stories as means of abstraction
- transfer learning across language and vision
- reasoning visually about language problems
- visual synthesis from language
- joint video and language alignment and parsing