Yonatan Belinkov

research

∙ 08/25/2023

Unified Concept Editing in Diffusion Models

Text-to-image models suffer from various safety issues that may limit th...

0 Rohit Gandikota, et al. ∙

research

∙ 08/17/2023

Linearity of Relation Decoding in Transformer Language Models

Much of the knowledge encoded in transformer language models (LMs) may b...

0 Evan Hernandez, et al. ∙

research

∙ 08/01/2023

Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

Recent studies show that instruction tuning and learning from human feed...

0 Itay Itzhak, et al. ∙

research

∙ 07/13/2023

Generating Benchmarks for Factuality Evaluation of Language Models

Before deploying a language model (LM) within a given domain, it is impo...

0 Dor Muhlgay, et al. ∙

research

∙ 06/01/2023

ReFACT: Updating Text-to-Image Models by Editing the Text Encoder

Text-to-image models are trained on extensive amounts of data, leading t...

0 Dana Arad, et al. ∙

research

∙ 05/24/2023

Understanding Arithmetic Reasoning in Language Models using Causal Mediation Analysis

Mathematical reasoning in large language models (LLMs) has garnered atte...

0 Alessandro Stolfo, et al. ∙

research

∙ 05/22/2023

Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT

Recent advances in interpretability suggest we can project weights and h...

0 Shahar Katz, et al. ∙

research

∙ 05/17/2023

Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

Natural language processing models tend to learn and encode social biase...

0 Shadi Iskander, et al. ∙

research

∙ 03/29/2023

ContraSim – A Similarity Measure Based on Contrastive Learning

Recent work has compared neural network representations via similarity-b...

0 Adir Rahamim, et al. ∙

research

∙ 03/14/2023

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Text-to-image diffusion models often make implicit assumptions about the...

0 Hadas Orgad, et al. ∙

research

∙ 12/21/2022

Parallel Context Windows Improve In-Context Learning of Large Language Models

For applications that require processing large amounts of text at infere...

0 Nir Ratner, et al. ∙

research

∙ 12/20/2022

Debiasing NLP Models Without Demographic Information

Models trained from real-world data tend to imitate and amplify social b...

0 Hadas Orgad, et al. ∙

research

∙ 12/20/2022

What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary

Dual encoders are now the dominant architecture for dense retrieval. Yet...

0 Ori Ram, et al. ∙

research

∙ 11/04/2022

Emergent Quantized Communication

The field of emergent communication aims to understand the characteristi...

0 Boaz Carmeli, et al. ∙

research

∙ 10/20/2022

Choose Your Lenses: Flaws in Gender Bias Evaluation

Considerable efforts to measure and mitigate gender bias in recent years...

0 Hadas Orgad, et al. ∙

research

∙ 10/17/2022

Measures of Information Reflect Memorization Patterns

Neural networks are known to exploit spurious artifacts (or shortcuts) t...

1 Rachit Bansal, et al. ∙

research

∙ 10/13/2022

Mass-Editing Memory in a Transformer

Recent work has shown exciting promise in updating large language models...

0 Kevin Meng, et al. ∙

research

∙ 07/28/2022

Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

Large amounts of training data are one of the major reasons for the high...

0 Yanai Elazar, et al. ∙

research

∙ 06/01/2022

IDANI: Inference-time Domain Adaptation via Neuron-level Interventions

Large pre-trained models are usually fine-tuned on downstream task data,...

0 Omer Antverg, et al. ∙

research

∙ 05/01/2022

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

Huge language models (LMs) have ushered in a new era for AI, serving as ...

6 Ehud Karpas, et al. ∙

research

∙ 04/14/2022

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

Common studies of gender bias in NLP focus either on extrinsic bias meas...

0 Hadas Orgad, et al. ∙

research

∙ 04/11/2022

A Multilingual Perspective Towards the Evaluation of Attribution Methods in Natural Language Inference

Most evaluations of attribution methods focus on the English language. I...

0 Kerem Zaman, et al. ∙

research

∙ 02/10/2022

Locating and Editing Factual Knowledge in GPT

We investigate the mechanisms underlying factual knowledge recall in aut...

8 Kevin Meng, et al. ∙

research

∙ 10/14/2021

On the Pitfalls of Analyzing Individual Neurons in Language Models

While many studies have shown that linguistic information is encoded in ...

0 Omer Antverg, et al. ∙

research

∙ 09/09/2021

Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Model robustness to bias is often determined by the generalization on ca...

0 Michael Mendelson, et al. ∙

research

∙ 08/31/2021

A Generative Approach for Mitigating Structural Biases in Natural Language Inference

Many natural language inference (NLI) datasets contain biases that allow...

0 Dimion Asael, et al. ∙

research

∙ 06/10/2021

Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models

Targeted syntactic evaluations have demonstrated the ability of language...

0 Matthew Finlayson, et al. ∙

research

∙ 06/10/2021

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

While large-scale pretrained language models have obtained impressive re...

0 Rabeeh Karimi Mahabadi, et al. ∙

research

∙ 04/16/2021

Natural Language Inference with a Human Touch: Using Human Explanations to Guide Model Attention

Natural Language Inference (NLI) models are known to learn from biases a...

0 Joe Stacey, et al. ∙

research

∙ 02/24/2021

Probing Classifiers: Promises, Shortcomings, and Alternatives

Probing classifiers have emerged as one of the prominent methodologies f...

0 Yonatan Belinkov, et al. ∙

research

∙ 12/02/2020

Learning from others' mistakes: Avoiding dataset biases without modeling them

State-of-the-art natural language processing (NLP) models often learn to...

10 Victor Sanh, et al. ∙

research

∙ 10/22/2020

Similarity Analysis of Self-Supervised Speech Representations

Self-supervised speech representation learning has recently been a prosp...

0 Yu-An Chung, et al. ∙

research

∙ 10/06/2020

Analyzing Individual Neurons in Pre-trained Language Models

While a lot of analysis has been carried to demonstrate linguistic knowl...

0 Nadir Durrani, et al. ∙

research

∙ 06/07/2020

Probing Neural Dialog Models for Conversational Understanding

The predominant approach to open-domain dialog generation relies on end-...

0 Abdelrhman Saleh, et al. ∙

research

∙ 05/04/2020

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

Large-scale pretrained language models are the major driving force behin...

0 Mostafa Abdou, et al. ∙

research

∙ 05/03/2020

Similarity Analysis of Contextual Word Representation Models

This paper investigates contextual word representation models from the l...

0 John M. Wu, et al. ∙

research

∙ 04/26/2020

Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

Common methods for interpreting neural models in natural language proces...

0 Jesse Vig, et al. ∙

research

∙ 04/08/2020

Exploiting Redundancy in Pre-trained Language Models for Efficient Transfer Learning

Large pre-trained contextual word representations have transformed the f...

0 Fahim Dalvi, et al. ∙

research

∙ 11/08/2019

Memory-Augmented Recurrent Neural Networks Can Learn Generalized Dyck Languages

We introduce three memory-augmented Recurrent Neural Networks (MARNNs) a...

23 Mirac Suzgun, et al. ∙

research

∙ 11/01/2019

On the Linguistic Representational Power of Neural Machine Translation Models

Despite the recent success of deep neural networks in natural language p...

0 Yonatan Belinkov, et al. ∙

research

∙ 09/27/2019

A Constructive Prediction of the Generalization Error Across Scales

The dependency of the generalization error of neural networks on model a...

11 Jonathan S. Rosenfeld, et al. ∙

research

∙ 07/09/2019

On Adversarial Removal of Hypothesis-only Bias in Natural Language Inference

Popular Natural Language Inference (NLI) datasets have been shown to be ...

0 Yonatan Belinkov, et al. ∙

research

∙ 07/09/2019

Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

Natural Language Inference (NLI) datasets often contain hypothesis-only ...

0 Yonatan Belinkov, et al. ∙

research

∙ 07/09/2019

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition

End-to-end neural network systems for automatic speech recognition (ASR)...

0 Yonatan Belinkov, et al. ∙

research

∙ 06/27/2019

Findings of the First Shared Task on Machine Translation Robustness

We share the findings of the first shared task on improving robustness o...

0 Xian Li, et al. ∙

research

∙ 06/20/2019

Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects

Visual question answering (VQA) models have been shown to over-rely on l...

1 Gabriel Grand, et al. ∙

research

∙ 06/09/2019

LSTM Networks Can Perform Dynamic Counting

In this paper, we systematically assess the ability of standard recurren...

0 Mirac Suzgun, et al. ∙

research

∙ 06/07/2019

Analyzing the Structure of Attention in a Transformer Language Model

The Transformer is a fully attention-based alternative to recurrent netw...

0 Jesse Vig, et al. ∙

research

∙ 06/04/2019

Improving Neural Language Models by Segmenting, Attending, and Predicting the Future

Common language models typically predict the next word given the context...

0 Hongyin Luo, et al. ∙

research

∙ 03/21/2019

Linguistic Knowledge and Transferability of Contextual Representations

Contextual word representations derived from large-scale neural language...

0 Nelson F. Liu, et al. ∙

Yonatan Belinkov

Featured Co-authors

Sign in with Google

Consider DeepAI Pro