Efficient Video-Text Learning with Iterative Co-tokenization

1 · Google AI Research · Aug. 9, 2022, 6:02 p.m.
Posted by AJ Piergiovanni and Anelia Angelova, Research Scientists, Google Research, Brain Team Video is an ubiquitous source of media content that touches on many aspects of people’s day-to-day lives. Increasingly, real-world video applications, such as video captioning, video content analysis, and video question-answering (VideoQA), rely on models that can connect video content with text or natural language. VideoQA is particularly challenging, however, as it requires grasping both semantic in...