ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

2 · Google AI Research · May 11, 2021, 9:44 p.m.
Posted by Chao Jia and Yinfei Yang, Software Engineers, Google Research Learning good visual and vision-language representations is critical to solving computer vision problems — image retrieval, image classification, video understanding — and can enable the development of tools and products that change people’s daily lives. For example, a good vision-language matching model can help users find the most relevant images given a text description or an image input and help tools such as Google Len...