Are you doing what I say? On modalities alignment in ALFRED

10/12/2021

∙

ALFRED is a recently proposed benchmark that requires a model to complete tasks in simulated house environments specified by instructions in natural language. We hypothesize that key to success is accurately aligning the text modality with visual inputs. Motivated by this, we inspect how well existing models can align these modalities using our proposed intrinsic metric, boundary adherence score (BAS). The results show the previous models are indeed failing to perform proper alignment. To address this issue, we introduce approaches aimed at improving model alignment and demonstrate how improved alignment, improves end task performance.

READ FULL TEXT

Are you doing what I say? On modalities alignment in ALFRED

Sign in with Google

Consider DeepAI Pro