Shrikanth Narayanan

research

∙ 09/18/2023

Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization

Video summarization remains a huge challenge in computer vision due to t...

0 Yoonsoo Nam, et al. ∙

research

∙ 09/15/2023

Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting

Significant advances are being made in speech emotion recognition (SER) ...

0 Tiantian Feng, et al. ∙

research

∙ 08/27/2023

MM-AU:Towards Multimodal Understanding of Advertisement Videos

Advertisement videos (ads) play an integral part in the domain of Intern...

0 Digbalay Bose, et al. ∙

research

∙ 08/24/2023

Emotion-Aligned Contrastive Learning Between Images and Music

Traditional music search engines rely on retrieval methods that match na...

0 Shanti Stewart, et al. ∙

research

∙ 07/10/2023

Learning Behavioral Representations of Routines From Large-scale Unlabeled Wearable Time-series Data Streams using Hawkes Point Process

Continuously-worn wearable sensors enable researchers to collect copious...

0 Tiantian Feng, et al. ∙

research

∙ 06/13/2023

Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content

Automatic Speech Understanding (ASU) leverages the power of deep learnin...

0 Tiantian Feng, et al. ∙

research

∙ 06/08/2023

PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models

Many recent studies have focused on fine-tuning pre-trained models for s...

0 Tiantian Feng, et al. ∙

research

∙ 05/18/2023

TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition

Recent studies have explored the use of pre-trained embeddings for speec...

0 Tiantian Feng, et al. ∙

research

∙ 04/17/2023

Signal Processing Grand Challenge 2023 – e-Prevention: Sleep Behavior as an Indicator of Relapses in Psychotic Patients

This paper presents the approach and results of USC SAIL's submission to...

7 Kleanthis Avramidis, et al. ∙

research

∙ 04/03/2023

Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAP

There is an imminent need for guidelines and standard test sets to allow...

0 Nikolaos Antoniou, et al. ∙

research

∙ 03/13/2023

Contextually-rich human affect perception using multimodal scene information

The process of human affect understanding involves the ability to infer ...

10 Digbalay Bose, et al. ∙

research

∙ 02/14/2023

A dataset for Audio-Visual Sound Event Detection in Movies

Audio event detection is a widely studied audio processing task, with ap...

0 Rajat Hebbar, et al. ∙

research

∙ 12/18/2022

Exploring Workplace Behaviors through Speaking Patterns using Large-scale Multimodal Wearable Recordings: A Study of Healthcare Providers

Interpersonal spoken communication is central to human interaction and t...

0 Tiantian Feng, et al. ∙

research

∙ 12/01/2022

Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection

Active speaker detection in videos addresses associating a source face, ...

0 Rahul Sharma, et al. ∙

research

∙ 11/25/2022

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?

With the similarity between music and speech synthesis from symbolic inp...

0 Xuan Shi, et al. ∙

research

∙ 11/07/2022

A Context-Aware Computational Approach for Measuring Vocal Entrainment in Dyadic Conversations

Vocal entrainment is a social adaptation mechanism in human interaction,...

0 Rimita Lahiri, et al. ∙

research

∙ 10/31/2022

Using Emotion Embeddings to Transfer Knowledge Between Emotions, Languages, and Annotation Formats

The need for emotional inference from text continues to diversify as mor...

0 Georgios Chochlakis, et al. ∙

research

∙ 10/28/2022

Leveraging Label Correlations in a Multi-label Setting: A Case Study in Emotion

Detecting emotions expressed in text has become critical to a range of f...

0 Georgios Chochlakis, et al. ∙

research

∙ 10/28/2022

On the Role of Visual Context in Enriching Music Representations

Human perception and experience of music is highly context-dependent. Co...

0 Kleanthis Avramidis, et al. ∙

research

∙ 10/28/2022

Multimodal Estimation of Change Points of Physiological Arousal in Drivers

Detecting unsafe driving states, such as stress, drowsiness, and fatigue...

0 Kleanthis Avramidis, et al. ∙

research

∙ 10/25/2022

Leveraging Open Data and Task Augmentation to Automated Behavioral Coding of Psychotherapy Conversations in Low-Resource Scenarios

In psychotherapy interactions, the quality of a session is assessed by c...

0 Zhuohao Chen, et al. ∙

research

∙ 10/20/2022

MovieCLIP: Visual Scene Recognition in Movies

Longform media such as movies have complex narrative structures, with ev...

17 Digbalay Bose, et al. ∙

research

∙ 09/24/2022

Unsupervised active speaker detection in media content using cross-modal information

We present a cross-modal unsupervised framework for active speaker detec...

6 Rahul Sharma, et al. ∙

research

∙ 08/18/2022

VAuLT: Augmenting the Vision-and-Language Transformer with the Propagation of Deep Language Representations

We propose the Vision-and-Augmented-Language Transformer (VAuLT). VAuLT ...

30 Georgios Chochlakis, et al. ∙

research

∙ 07/10/2022

Automating Detection of Papilledema in Pediatric Fundus Images with Explainable Machine Learning

Papilledema is an ophthalmic neurologic disorder in which increased intr...

18 Kleanthis Avramidis, et al. ∙

research

∙ 04/05/2022

User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning

Many existing privacy-enhanced speech emotion recognition (SER) framewor...

0 Tiantian Feng, et al. ∙

research

∙ 04/01/2022

Multimodal Clustering with Role Induced Constraints for Speaker Diarization

Speaker clustering is an essential step in conventional speaker diarizat...

0 Nikolaos Flemotomos, et al. ∙

research

∙ 03/30/2022

Using Active Speaker Faces for Diarization in TV shows

Speaker diarization is one of the critical components of computational m...

0 Rahul Sharma, et al. ∙

research

∙ 03/29/2022

Mel Frequency Spectral Domain Defenses against Adversarial Attacks on Speech Recognition Systems

A variety of recent works have looked into defenses for deep neural netw...

0 Nicholas Mehlman, et al. ∙

research

∙ 03/21/2022

Audio visual character profiles for detecting background characters in entertainment media

An essential goal of computational media intelligence is to support unde...

0 Rahul Sharma, et al. ∙

research

∙ 03/15/2022

Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling

Speech Emotion Recognition (SER) application is frequently associated wi...

1 Tiantian Feng, et al. ∙

research

∙ 10/11/2021

Cross Domain Emotion Recognition using Few Shot Knowledge Transfer

Emotion recognition from text is a challenging task due to diverse emoti...

10 Justin Olah, et al. ∙

research

∙ 10/08/2021

Representation of professions in entertainment media: Insights into frequency and sentiment trends through computational text analysis

Societal ideas and trends dictate media narratives and cinematic depicti...

1 Sabyasachee Baruah, et al. ∙

research

∙ 09/03/2021

Phone Duration Modeling for Speaker Age Estimation in Children

Automatic inference of important paralinguistic information such as age ...

0 Prashanth Gurunath Shivakumar, et al. ∙

research

∙ 07/12/2021

Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems

In this paper we investigate speech denoising as a defense against adver...

0 Anirudh Sreeram, et al. ∙

research

∙ 06/15/2021

An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality Estimates

Computational approaches for assessing the quality of conversation-based...

8 Zhuohao Chen, et al. ∙

research

∙ 04/05/2021

Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition

Key challenges in developing generalized automatic emotion recognition s...

0 Haoqi Li, et al. ∙

research

∙ 04/01/2021

Unsupervised Speech Representation Learning for Behavior Modeling using Triplet Enhanced Contextualized Networks

Speech encodes a wealth of information related to human behavior and has...

10 Haoqi Li, et al. ∙

research

∙ 03/04/2021

Front-end Diarization for Percussion Separation in Taniavartanam of Carnatic Music Concerts

Instrument separation in an ensemble is a challenging task. In this work...

0 Nauman Dawalatabad, et al. ∙

research

∙ 02/23/2021

Automated Quality Assessment of Cognitive Behavioral Therapy Sessions Through Highly Contextualized Language Representations

During a psychotherapy session, the counselor typically adopts technique...

2 Nikolaos Flemotomos, et al. ∙

research

∙ 02/22/2021

Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies

With the growing prevalence of psychological interventions, it is vital ...

9 Nikolaos Flemotomos, et al. ∙

research

∙ 02/19/2021

End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study

A key desiderata for inclusive and accessible speech recognition technol...

0 Prashanth Gurunath Shivakumar, et al. ∙

research

∙ 02/03/2021

Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

Word vector representations enable machines to encode human language for...

7 Prashanth Gurunath Shivakumar, et al. ∙

research

∙ 01/24/2021

A Review of Speaker Diarization: Recent Advances with Deep Learning

Speaker diarization is a task to label audio or video recordings with cl...

3 Tae Jin Park, et al. ∙

research

∙ 08/25/2020

Multi-Face: Self-supervised Multiview Adaptation for Robust Face Clustering in Videos

Robust face clustering is a key step towards computational understanding...

0 Krishna Somandepalli, et al. ∙

research

∙ 08/19/2020

Victim or Perpetrator? Analysis of Violent Characters Portrayals from Movie Scripts

Violent content in the media can influence viewers' perception of the so...

0 Victor R Martinez, et al. ∙

research

∙ 08/18/2020

Adversarial Attack and Defense Strategies for Deep Speaker Recognition Systems

Robust speaker recognition, including in the presence of malicious attac...

0 Arindam Jati, et al. ∙

research

∙ 08/04/2020

Having a Bad Day? Detecting the Impact of Atypical Life Events Using Wearable Sensors

Life events can dramatically affect our psychological state and work per...

0 Keith Burghardt, et al. ∙

research

∙ 07/31/2020

Designing Neural Speaker Embeddings with Meta Learning

Neural speaker embeddings trained using classification objectives have d...

0 Tae Jin Park, et al. ∙

research

∙ 07/27/2020

Evidence of Task-Independent Person-Specific Signatures in EEG using Subspace Techniques

Electroencephalography (EEG) signals are promising as a biometric owing ...

0 Mari Ganesh Kumar, et al. ∙

Shrikanth Narayanan

Featured Co-authors

Sign in with Google

Consider DeepAI Pro