ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

2 · Google AI Research · May 11, 2021, 9:44 p.m.

Posted by Chao Jia and Yinfei Yang, Software Engineers, Google Research Learning good visual and vision-language representations is critical to solving computer vision problems — image retrieval, image classification, video understanding — and can enable the development of tools and products that change people’s daily lives. For example, a good vision-language matching model can help users find the most relevant images given a text description or an image input and help tools such as Google Len...

Read full post on feedproxy.google.com

BLOG POST FEATURED ON

Hacker News

1 points

Add this plugin to your blog