SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations

08/17/2023
by   Zhiming Wang, et al.
0

Due to the prevalence of scale variance in nature images, we propose to use image scale as a self-supervised signal for Masked Image Modeling (MIM). Our method involves selecting random patches from the input image and downsampling them to a low-resolution format. Our framework utilizes the latest advances in super-resolution (SR) to design the prediction head, which reconstructs the input from low-resolution clues and other patches. After 400 epochs of pre-training, our Super Resolution Masked Autoencoders (SRMAE) get an accuracy of 82.1 capture scale invariance representation. For the very low resolution (VLR) recognition task, our model achieves the best performance, surpassing DeriveNet by 1.3 recognizing low-resolution facial expressions, surpassing the current state-of-the-art FMD by 9.48

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset