A New Approach for Query Expansion using Wikipedia and WordNet
Query expansion (QE) is a well known technique to enhance the effectiveness of information retrieval (IR). QE reformulates the initial query by adding similar terms that helps in retrieving more relevant results. Several approaches have been proposed with remarkable outcome, but they are not evenly favorable for all types of queries. One of the main reasons for this is the use of the same data source while expanding both the individual and the phrase query terms. As a result, the holistic relationship among the query terms is not well captured. To address this issue, we have selected separate data sources for individual and phrase terms. Specifically, we have used WordNet for expanding individual terms and Wikipedia for expanding phrase terms. We have also proposed novel schemes for weighting expanded terms: inlink score (for terms extracted from Wikipedia) and a tfidf based scheme (for terms extracted from WordNet). In the proposed Wikipedia WordNet based QE technique (WWQE), we weigh the expansion terms twice: first, they are scored by the weighting scheme individually, and then, the weighting scheme scores the selected expansion terms in relation to the entire query using correlation score. The experimental results show that the proposed approach successfully combines Wikipedia and WordNet as demonstrated through a better performance on standard evaluation metrics on FIRE dataset. The proposed WWQE approach is also suitable with other standard weighting models for improving the effectiveness of IR.
READ FULL TEXT