Training data attribution (TDA) methods offer to trace a model's predict...
Pretrained large language models (LLMs) are able to solve a wide variety...
Text-based safety classifiers are widely used for content moderation and...
Neural language models (LMs) have been shown to memorize a great deal of...
Developing a suitable Deep Neural Network (DNN) often requires significa...
Integrated Gradients (IG) is a commonly used feature attribution method ...
We describe an "interpretability illusion" that arises when analyzing th...
We present the Language Interpretability Tool (LIT), an open-source plat...
(Bolukbasi et al., 2016) demonstrated that pretrained word embeddings ca...
A key challenge in developing and deploying Machine Learning (ML) system...
Saliency methods can aid understanding of deep neural networks. Recent y...
As we move towards large-scale object detection, it is unrealistic to ex...
We present an approach to adaptively utilize deep neural networks in ord...
The blind application of machine learning runs the risk of amplifying bi...
Machine learning algorithms are optimized to model statistical propertie...
We study the problem of structured prediction under test-time budget
con...