EARL: Speedup Transformer-based Rankers with Pre-computed Representation

04/28/2020
by   Luyu Gao, et al.
0

Recent innovations in Transformer-based ranking models have advanced the state-of-the-art in information retrieval. However, their performance gains come at a steep computational cost. This paper presents a novel Embed Ahead Rank Later (EARL) framework, which speeds-up Transformer-based rankers by pre-computing representations and keeping online computation shallow. EARL dis-entangles the attention in a typical Transformer-based ranker into three asynchronous tasks and assign each to a dedicated Transformer: query understanding, document understanding, and relevance judging. With such a ranking framework, query and document token representations can be offline computed and reused. We also propose a new judger transformer block that keeps online relevance judging light and shallow. Our experiments demonstrate that EARL can be as effective as previous state-of-the-art BERT rankers in accuracy while substantially faster in evaluation time.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro