A Blueprint of IR Evaluation Integrating Task and User Characteristics: Test Collection and Evaluation Metrics

05/01/2023

∙

Relevance is generally understood as a multi-level and multi-dimensional relationship between an information need and an information object. However, traditional IR evaluation metrics naively assume mono-dimensionality. We ask: How to deal with multidimensional and graded relevance assessments in IR evaluation? Moreover, search result evaluation metrics neglect document overlaps and naively assume gains piling up as the searcher examines the ranked list into greater length. Consequently, we examine: How to deal with document overlap in IR evaluation? The usability of a document for a person-in-need also depends on document usability attributes beyond relevance. Therefore, we ask: How to deal with usability attributes, and how to combine this with multidimensional relevance assessments in IR evaluation? Finally, we ask how to define a formal model, which deals with multidimensional graded relevance assessments, document overlaps, and document usability attributes in a coherent framework serving IR evaluation?

READ FULL TEXT

A Blueprint of IR Evaluation Integrating Task and User Characteristics: Test Collection and Evaluation Metrics

On the Effect of Ranking Axioms on IR Evaluation Metrics

Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments

Relevance Judgment Convergence Degree – A Measure of Inconsistency among Assessors for Information Retrieval

Surprise: Result List Truncation via Extreme Value Theory

A Usefulness-based Approach for Measuring the Local and Global Effect of IIR Services

Atomized Search Length: Beyond User Models

Joint Upper Lower Bound Normalization for IR Evaluation

A Blueprint of IR Evaluation Integrating Task and User Characteristics: Test Collection and Evaluation Metrics

Related Research

On the Effect of Ranking Axioms on IR Evaluation Metrics

Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments

Relevance Judgment Convergence Degree – A Measure of Inconsistency among Assessors for Information Retrieval

Surprise: Result List Truncation via Extreme Value Theory

A Usefulness-based Approach for Measuring the Local and Global Effect of IIR Services

Atomized Search Length: Beyond User Models

Joint Upper Lower Bound Normalization for IR Evaluation