Efficient Video-Text Learning with Iterative Co-tokenization

1 · Google AI Research · Aug. 9, 2022, 6:02 p.m.

Posted by AJ Piergiovanni and Anelia Angelova, Research Scientists, Google Research, Brain Team Video is an ubiquitous source of media content that touches on many aspects of people’s day-to-day lives. Increasingly, real-world video applications, such as video captioning, video content analysis, and video question-answering (VideoQA), rely on models that can connect video content with text or natural language. VideoQA is particularly challenging, however, as it requires grasping both semantic in...

Read full post on ai.googleblog.com