Visually grounded speech systems learn from paired images and their spok...
The recently proposed Joint Energy-based Model (JEM) interprets
discrimi...
Considering the abundance of unlabeled speech data and the high labeling...
Adversarial attacks are a threat to automatic speech recognition (ASR)
s...
Adversarial attacks pose a severe security threat to the state-of-the-ar...
Speech systems developed for a particular choice of acoustic domain and
...
Typically, unsupervised segmentation of speech into the phone and word-l...
This technical report describes Johns Hopkins University speaker recogni...
Speech emotion recognition is the task of recognizing the speaker's emot...
Automatic detection of phoneme or word-like units is one of the core
obj...
The ubiquitous presence of machine learning systems in our lives necessi...
Research in automatic speaker recognition (SR) has been undertaken for
s...
Environmental noises and reverberation have a detrimental effect on the
...
This paper introduces a novel method to diagnose the source-target atten...
Data augmentation is a widely used strategy for training robust machine
...
Deep learning based speech denoising still suffers from the challenge of...
Zero-shot multi-speaker Text-to-Speech (TTS) generates target speaker vo...
Unsupervised spoken term discovery consists of two tasks: finding the
ac...
We investigated an enhancement and a domain adaptation approach to make
...
In this work, we explore the dependencies between speaker recognition an...
Data augmentation is conventionally used to inject robustness in Speaker...
This paper presents the problems and solutions addressed at the JSALT
wo...
Recently very deep transformers start showing outperformed performance t...
The task of making speaker verification systems robust to adverse scenar...
Current speaker recognition technology provides great performance with t...
Speaker Verification still suffers from the challenge of generalization ...
BERT, which stands for Bidirectional Encoder Representations from
Transf...
We present JHU's system submission to the ASVspoof 2019 Challenge:
Anti-...
In this paper, we explore several new schemes to train a seq2seq model t...
In this document we are going to derive the equations needed to implemen...
State-of-the-art speaker recognition relays on models that need a large
...
In some speaker recognition scenarios we find conversations recorded
sim...
In this document we are going to derive the equations needed to implemen...