Elizabeth Clark

research

∙ 05/22/2023

SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation

Reliable automatic evaluation of summarization systems is challenging du...

0 Elizabeth Clark, et al. ∙

research

∙ 12/20/2022

mFACE: Multilingual Summarization with Factual Consistency Evaluation

Abstractive summarization has enjoyed renewed interest in recent years, ...

0 Roee Aharoni, et al. ∙

research

∙ 12/20/2022

Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization

The acquisition of high-quality human annotations through crowdsourcing ...

10 Lining Zhang, et al. ∙

research

∙ 11/02/2022

Dialect-robust Evaluation of Generated Text

Evaluation metrics that are not robust to dialect variation make it impo...

0 Jiao Sun, et al. ∙

research

∙ 02/14/2022

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

Evaluation practices in natural language generation (NLG) have many know...

0 Sebastian Gehrmann, et al. ∙

research

∙ 06/30/2021

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

Human evaluations are typically considered the gold standard in natural ...

0 Elizabeth Clark, et al. ∙

research

∙ 06/26/2020

Evaluation of Text Generation: A Survey

The paper surveys evaluation methods of natural language generation (NLG...

0 Asli Celikyilmaz, et al. ∙

research

∙ 04/07/2020

Evaluating Machines by their Real-World Language Use

There is a fundamental gap between how humans understand and use languag...

0 Rowan Zellers, et al. ∙

research

∙ 09/09/2019

Counterfactual Story Reasoning and Generation

Counterfactual reasoning requires predicting how alternative events, con...

0 Lianhui Qin, et al. ∙

research

∙ 04/26/2018

Sounding Board: A User-Centric and Content-Driven Social Chatbot

We present Sounding Board, a social chatbot that won the 2017 Amazon Ale...

0 Hao Fang, et al. ∙

Elizabeth Clark

Featured Co-authors

Sign in with Google

Consider DeepAI Pro