Position Masking for Improved Layout-Aware Document Understanding

09/01/2021
by   Anik Saha, et al.
0

Natural language processing for document scans and PDFs has the potential to enormously improve the efficiency of business processes. Layout-aware word embeddings such as LayoutLM have shown promise for classification of and information extraction from such documents. This paper proposes a new pre-training task called that can improve performance of layout-aware word embeddings that incorporate 2-D position embeddings. We compare models pre-trained with only language masking against models pre-trained with both language masking and position masking, and we find that position masking improves performance by over 5

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset