The Simplest Thing That Can Possibly Work: Pseudo-Relevance Feedback Using Text Classification

04/18/2019
by   Jimmy Lin, et al.
0

Motivated by recent commentary that has questioned today's pursuit of ever-more complex models and mathematical formalisms in applied machine learning and whether meaningful empirical progress is actually being made, this paper tries to tackle the decades-old problem of pseudo-relevance feedback with "the simplest thing that can possibly work". I present a technique based on training a document relevance classifier for each information need using pseudo-labels from an initial ranked list and then applying the classifier to rerank the retrieved documents. Experiments demonstrate significant improvements across a number of newswire collections, with initial rankings supplied by "bag of words" BM25 as well as from a well-tuned query expansion model. While this simple technique draws elements from several well-known threads in the literature, to my knowledge this exact combination has not previously been proposed and evaluated.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset