Visual Grounding in Video for Unsupervised Word Translation

1 · DeepMind · March 11, 2020, midnight
Our goal is to use visual grounding to improve unsupervised word mapping between languages. The key idea is to establish a common visual representation between two languages by learning embeddings from unpaired instructional videos narrated in the native language....