Reliable automatic evaluation of summarization systems is challenging du...
Abstractive summarization has enjoyed renewed interest in recent years,
...
The acquisition of high-quality human annotations through crowdsourcing
...
Evaluation metrics that are not robust to dialect variation make it
impo...
Evaluation practices in natural language generation (NLG) have many know...
Human evaluations are typically considered the gold standard in natural
...
The paper surveys evaluation methods of natural language generation (NLG...
There is a fundamental gap between how humans understand and use languag...
Counterfactual reasoning requires predicting how alternative events, con...
We present Sounding Board, a social chatbot that won the 2017 Amazon Ale...