Are you doing what I say? On modalities alignment in ALFRED

10/12/2021
by   Ting-Rui Chiang, et al.
0

ALFRED is a recently proposed benchmark that requires a model to complete tasks in simulated house environments specified by instructions in natural language. We hypothesize that key to success is accurately aligning the text modality with visual inputs. Motivated by this, we inspect how well existing models can align these modalities using our proposed intrinsic metric, boundary adherence score (BAS). The results show the previous models are indeed failing to perform proper alignment. To address this issue, we introduce approaches aimed at improving model alignment and demonstrate how improved alignment, improves end task performance.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset