Swaroop Mishra

research

∙ 06/25/2023

Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Language models still struggle on moral reasoning, despite their impress...

0 Xiao Ma, et al. ∙

research

∙ 02/09/2023

Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow

Recent research has shown that language models exploit `artifacts' in be...

2 Anjana Arunkumar, et al. ∙

research

∙ 10/31/2022

Lila: A Unified Benchmark for Mathematical Reasoning

Mathematical reasoning skills are essential for general-purpose intellig...

15 Swaroop Mishra, et al. ∙

research

∙ 10/14/2022

Pretrained Transformers Do not Always Improve Robustness

Pretrained Transformers (PT) have been shown to improve Out of Distribut...

2 Swaroop Mishra, et al. ∙

research

∙ 10/14/2022

Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task

Evaluation of models on benchmarks is unreliable without knowing the deg...

1 Swaroop Mishra, et al. ∙

research

∙ 10/14/2022

A Survey of Parameters Associated with the Quality of Benchmarks in NLP

Several benchmarks have been built with heavy investment in resources to...

12 Swaroop Mishra, et al. ∙

research

∙ 10/10/2022

Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications

With the increasing importance of safety requirements associated with th...

9 Swaroop Mishra, et al. ∙

research

∙ 09/20/2022

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

When answering a question, humans utilize the information available acro...

73 Pan Lu @ UCLA, et al. ∙

research

∙ 08/17/2022

HELP ME THINK: A Simple Prompting Strategy for Non-experts to Create Customized Content with Models

Controlling the text generated by language models and customizing the co...

21 Swaroop Mishra, et al. ∙

research

∙ 07/06/2022

BioTABQA: Instruction Learning for Biomedical Table Question Answering

Table Question Answering (TQA) is an important but under-explored task. ...

14 Man Luo, et al. ∙

research

∙ 05/25/2022

Is a Question Decomposition Unit All We Need?

Large Language Models (LMs) have achieved state-of-the-art performance o...

8 Pruthvi Patel, et al. ∙

research

∙ 05/19/2022

Let the Model Decide its Curriculum for Multitask Learning

Curriculum learning strategies in prior multi-task learning approaches a...

0 Neeraj Varshney, et al. ∙

research

∙ 05/01/2022

Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions

In recent years, progress in NLU has been driven by benchmarks. These be...

5 Mihir Parmar, et al. ∙

research

∙ 04/15/2022

In-BoXBART: Get Instructions into Biomedical Multi-Task Learning

Single-task models have proven pivotal in solving specific tasks; howeve...

23 Mihir Parmar, et al. ∙

research

∙ 04/12/2022

NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks

Given the ubiquitous nature of numbers in text, reasoning with numbers t...

6 Swaroop Mishra, et al. ∙

research

∙ 03/17/2022

How Many Data Samples is an Additional Instruction Worth?

Recently introduced instruction-paradigm empowers non-expert users to le...

4 Ravsehaj Singh Puri, et al. ∙

research

∙ 03/16/2022

Less is More: Summary of Long Instructions is Better for Program Synthesis

Despite the success of large pre-trained language models (LMs) such as C...

2 Kirby Kuznia, et al. ∙

research

∙ 03/15/2022

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

Data modification, either via additional training datasets, data augment...

4 Tejas Gokhale, et al. ∙

research

∙ 03/12/2022

A Proposal to Study "Is High Quality Data All We Need?"

Even though deep neural models have achieved superhuman performance on m...

10 Swaroop Mishra, et al. ∙

research

∙ 03/07/2022

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

Knowledge of questions' difficulty level helps a teacher in several ways...

2 Neeraj Varshney, et al. ∙

research

∙ 03/01/2022

Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings

In order to equip NLP systems with selective prediction capability, seve...

1 Neeraj Varshney, et al. ∙

research

∙ 09/16/2021

Reframing Instructional Prompts to GPTk's Language

How can model designers turn task instructions into effective prompts fo...

9 Swaroop Mishra, et al. ∙

research

∙ 08/31/2021

Development of User-friendly Smart Grid Architecture

As systems like smart grid continue to become complex on a daily basis, ...

0 Swaroop Mishra, et al. ∙

research

∙ 07/01/2021

Interviewer-Candidate Role Play: Towards Developing Real-World NLP Systems

Standard NLP tasks do not incorporate several common real-world scenario...

12 Neeraj Varshney, et al. ∙

research

∙ 06/10/2021

Front Contribution instead of Back Propagation

Deep Learning's outstanding track record across several domains has stem...

48 Swaroop Mishra, et al. ∙

research

∙ 06/10/2021

How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation

Models that top leaderboards often perform unsatisfactorily when deploye...

8 Swaroop Mishra, et al. ∙

research

∙ 05/29/2021

Constructing Flow Graphs from Procedural Cybersecurity Texts

Following procedural texts written in natural languages is challenging. ...

16 Kuntal Kumar Pal, et al. ∙

research

∙ 04/18/2021

Natural Instructions: Benchmarking Generalization to New Tasks from Natural Language Instructions

Can we enable NLP models to appropriately respond to instructional promp...

12 Swaroop Mishra, et al. ∙

research

∙ 08/21/2020

It's better to say "I can't answer" than answering incorrectly: Towards Safety critical NLP systems

In order to make AI systems more reliable and their adoption in safety c...

0 Neeraj Varshney, et al. ∙

research

∙ 08/10/2020

DQI: A Guide to Benchmark Evaluation

A `state of the art' model A surpasses humans in a benchmark B, but fail...

11 Swaroop Mishra, et al. ∙

research

∙ 07/14/2020

Our Evaluation Metric Needs an Update to Encourage Generalization

Models that surpass human performance on several popular benchmarks disp...

6 Swaroop Mishra, et al. ∙

research

∙ 05/18/2020

Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks

Numerical reasoning is often important to accurately understand the worl...

17 Swaroop Mishra, et al. ∙

research

∙ 05/02/2020

DQI: Measuring Data Quality in NLP

Neural language models have achieved human level performance across seve...

0 Swaroop Mishra, et al. ∙

research

∙ 09/19/2019

Exploring ways to incorporate additional knowledge to improve Natural Language Commonsense Question Answering

DARPA and Allen AI have proposed a collection of datasets to encourage r...

0 Arindam Mitra, et al. ∙

Swaroop Mishra

Featured Co-authors

Sign in with Google

Consider DeepAI Pro