We propose EmoDistill, a novel speech emotion recognition (SER) framewor...
For faster sampling and higher sample quality, we propose DiNof
(Diffusi...
A new method is proposed for human motion prediction by learning tempora...
The high prevalence of cardiovascular diseases (CVDs) calls for accessib...
Cognitive load, the amount of mental effort required for task completion...
We propose a novel solution for predicting future trajectories of
pedest...
Deep learning has played a significant role in the success of facial
exp...
A continual learning solution is proposed to address the out-of-distribu...
The Long-Tailed Recognition (LTR) problem emerges in the context of lear...
Ubiquitous in-home health monitoring systems have become popular in rece...
We present a novel approach for the detection of deepfake videos using a...
We present a novel approach to mitigate bias in facial expression recogn...
Video self-supervised learning (VSSL) has made significant progress in r...
Deep learning-based methods have been the key driving force behind much ...
We propose UnMixMatch, a semi-supervised learning framework which can le...
We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tun...
Remote Photoplethysmography (rPPG) is the process of estimating PPG from...
Most semi-supervised learning (SSL) models entail complex structures and...
This paper presents a systematic investigation into the effectiveness of...
Through this paper, we introduce a novel driver cognitive load assessmen...
With the ubiquity of smart devices that use speaker recognition (SR) sys...
Despite the impressive performance of vision-based pose estimators, they...
Fully supervised learning has recently achieved promising performance in...
Deep audio representation learning using multi-modal audio-visual data o...
In semi-supervised representation learning frameworks, when the number o...
We present XKD, a novel self-supervised framework to learn meaningful
re...
Prior work has shown that the order in which different components of the...
We present ConCur, a contrastive video representation learning method th...
In this paper, we propose a self-supervised learning solution for human
...
Training deep neural networks for image recognition often requires
large...
Attention mechanisms have emerged as important tools that boost the
perf...
We present ObjectBox, a novel single-stage anchor-free and highly
genera...
We propose a novel neural pipeline, MSGazeNet, that learns gaze
represen...
In-bed pose estimation has shown value in fields such as hospital patien...
We propose cross-modal attentive connections, a new dynamic and effectiv...
3D hand pose estimation (HPE) is the process of locating the joints of t...
We introduce AVCAffe, the first Audio-Visual dataset consisting of Cogni...
We propose a multitask approach for crowd counting and person localizati...
We propose PARSE, a novel semi-supervised architecture for learning stro...
We present a novel multistream network that learns robust eye representa...
We propose an end-to-end architecture for facial expression recognition....
The technology used in smart homes have improved to learn the user
prefe...
We present CrissCross, a self-supervised framework for learning audio-vi...
Hand pose estimation (HPE) can be used for a variety of human-computer
i...
Recently, supervised methods, which often require substantial amounts of...
Automatic classification of running styles can enable runners to obtain
...
Electrocardiogram (ECG) has been widely used for emotion recognition. Th...
Facial expression recognition (FER) has emerged as an important componen...
We propose a self-supervised contrastive learning approach for facial
ex...
Classification of human emotions can play an essential role in the desig...