Successful active speaker detection requires a three-stage pipeline: (i)...
In this paper, we propose a new deep neural network classifier that
simu...
Robust object tracking requires knowledge of tracked objects' appearance...
Distracted drivers are more likely to fail to anticipate hazards, which
...
Convolutional Neural Networks with 3D kernels (3D CNNs) currently achiev...
The use of hand gestures provides a natural alternative to cumbersome
in...
For many practical problems and applications, it is not feasible to crea...
This paper studies unsupervised monocular depth prediction problem. Most...
Spatiotemporal action localization requires incorporation of two sources...
Understanding actions and gestures in video streams requires temporal
re...
Many road accidents occur due to distracted drivers. Today, driver monit...
The use of hand gestures provides a natural alternative to cumbersome
in...
Recently, convolutional neural networks with 3D kernels (3D CNNs) have b...
Real-time recognition of dynamic hand gestures from video streams is a
c...
A convolutional layer in a Convolutional Neural Network (CNN) consists o...
Acquiring spatio-temporal states of an action is the most crucial step f...