b'Paul R\xc3\xb6ttger'

DeepAI

AI Chat AI Image Generator AI Video AI Music Voice Chat AI Photo Editor Math AI

Featured Co-authors

James Zou
123 publications
Dan Jurafsky
77 publications
Tatsunori Hashimoto
37 publications
Federico Bianchi
34 publications
Dirk Hovy
29 publications
Scott A. Hale
22 publications
Bertie Vidgen
17 publications
Dong Nguyen
15 publications
Hannah Rose Kirk
15 publications
Debora Nozza
14 publications
Tristan Thrush
14 publications

research

∙ 09/14/2023

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Training large language models to follow instructions makes them perform...

0 Federico Bianchi, et al. ∙

research

∙ 08/02/2023

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Without proper safeguards, large language models will readily follow mal...

0 Paul Röttger, et al. ∙

research

∙ 06/20/2023

The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics

Many NLP tasks exhibit human label variation, where different annotators...

0 Matthias Orlikowski, et al. ∙

research

∙ 03/09/2023

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

Large language models (LLMs) are used to generate content for a wide ran...

2 Hannah Rose Kirk, et al. ∙

research

∙ 03/07/2023

SemEval-2023 Task 10: Explainable Detection of Online Sexism

Online sexism is a widespread and harmful phenomenon. Automated tools ca...

5 Hannah Rose Kirk, et al. ∙

research

∙ 10/20/2022

Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages

Hate speech is a global phenomenon, but most hate speech datasets so far...

1 Paul Röttger, et al. ∙

research

∙ 06/20/2022

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

Hate speech detection models are typically evaluated on held-out test se...

6 Paul Röttger, et al. ∙

research

∙ 12/14/2021

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

Labelled data is the foundation of most natural language processing task...

0 Paul Röttger, et al. ∙

research

∙ 08/12/2021

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

Detecting online hate is a complex task, and low-performing models have ...

4 Hannah Rose Kirk, et al. ∙

research

∙ 04/16/2021

Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media

Language use differs between domains and even within a domain, language ...

0 Paul Röttger, et al. ∙

research

∙ 12/31/2020

HateCheck: Functional Tests for Hate Speech Detection Models

Detecting online hate is a difficult task that even state-of-the-art mod...

0 Paul Röttger, et al. ∙

Paul Röttger

Featured Co-authors

Sign in with Google

Consider DeepAI Pro