Hyung Won Chung

research

∙ 05/24/2023

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts

The explosive growth of language models and their applications have led ...

0 Sheng Shen, et al. ∙

research

∙ 04/18/2023

UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

Pretrained multilingual large language models have typically used heuris...

0 Hyung Won Chung, et al. ∙

research

∙ 01/31/2023

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

We study the design decisions of publicly available instruction tuning m...

0 Shayne Longpre, et al. ∙

research

∙ 12/26/2022

Large Language Models Encode Clinical Knowledge

Large language models (LLMs) have demonstrated impressive capabilities i...

17 Karan Singhal, et al. ∙

research

∙ 10/20/2022

Transcending Scaling Laws with 0.1

Scaling language models improves performance but comes with significant ...

2 Yi Tay, et al. ∙

research

∙ 10/06/2022

Language Models are Multilingual Chain-of-Thought Reasoners

We evaluate the reasoning abilities of large language models in multilin...

0 Freda Shi, et al. ∙

research

∙ 07/21/2022

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

There have been a lot of interest in the scaling properties of Transform...

0 Yi Tay, et al. ∙

research

∙ 04/12/2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Large pretrained Transformer language models have been shown to exhibit ...

11 Thomas Wang, et al. ∙

research

∙ 04/05/2022

PaLM: Scaling Language Modeling with Pathways

Large language models have been shown to achieve remarkable performance ...

6 Aakanksha Chowdhery, et al. ∙

research

∙ 03/31/2022

Scaling Up Models and Data with and

Recent neural network-based language models have benefited greatly from ...

8 Adam Roberts, et al. ∙

research

∙ 10/12/2021

Learning Compact Metrics for MT

Recent developments in machine translation and multilingual text generat...

0 Amy Pu, et al. ∙

research

∙ 09/22/2021

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

There remain many open questions pertaining to the scaling behaviour of ...

3 Yi Tay, et al. ∙

research

∙ 06/23/2021

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

State-of-the-art models in natural language processing rely on separate ...

4 Yi Tay, et al. ∙

research

∙ 04/18/2021

Demystifying the Better Performance of Position Encoding Variants for Transformer

Transformers are state of the art models in NLP that map a given input s...

0 Pu-Chin Chen, et al. ∙

research

∙ 02/23/2021

Do Transformer Modifications Transfer Across Implementations and Applications?

The research community has proposed copious modifications to the Transfo...

10 Sharan Narang, et al. ∙

research

∙ 02/02/2021

Neural Data Augmentation via Example Extrapolation

In many applications of machine learning, certain categories of examples...

0 Kenton Lee, et al. ∙

research

∙ 10/24/2020

Rethinking embedding coupling in pre-trained language models

We re-evaluate the standard practice of sharing weights between input an...

0 Hyung Won Chung, et al. ∙

research

∙ 10/24/2020

Improving Multilingual Models with Language-Clustered Vocabularies

State-of-the-art multilingual models depend on vocabularies that cover a...

0 Hyung Won Chung, et al. ∙

research

∙ 10/08/2020

Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task

The quality of machine translation systems has dramatically improved ove...

0 Thibault Sellam, et al. ∙

research

∙ 08/15/2020

Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition

Transformer-based models have achieved stateof-the-art results in many t...

14 Henry Tsai, et al. ∙

Hyung Won Chung

Featured Co-authors

Sign in with Google

Consider DeepAI Pro