Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have
de...
Binary code similarity detection (BCSD) has various applications, includ...
The recently proposed segment anything model (SAM) has made a significan...
Recent researches indicate that utilizing the frequency information of i...
Background subtraction (BGS) aims to extract all moving objects in the v...
Inspired by masked language modeling (MLM) in natural language processin...
Skeleton-based action recognition has become popular in recent years due...
We present a simple yet effective end-to-end Video-language Pre-training...
Visual tasks vary a lot in their output formats and concerned contents,
...
Previous unsupervised domain adaptation methods did not handle the
cross...
Clustering-based methods, which alternate between the generation of pseu...
Self-supervised learning (SSL) holds promise in leveraging large amounts...
In person re-identification (ReID), very recent researches have validate...
Structural neural network pruning aims to remove the redundant channels ...
3D human pose and shape recovery from a monocular RGB image is a challen...
Transformer has achieved great success in computer vision, while how to ...
In this paper, we propose an Omni-perception Pre-Trainer (OPT) for
cross...
Transformer has been widely used for self-supervised pre-training in Nat...
Transformer is showing its superiority over convolutional architectures ...
To address the problem of long-tail distribution for the large vocabular...
Existing alignment-based methods have to employ the pretrained human par...
Few-shot image classification requires the classifier to robustly cope w...
Recently, deep learning based facial landmark detection has achieved gre...
Scene text recognition has received increased attention in the research
...
Correlation filter (CF) based trackers are currently ranked top in terms...
Recently, correlation filter based trackers (CF trackers) have attracted...
In recent years, two types of trackers, namely correlation filter based
...
Reading text in the wild is a challenging task in the field of computer
...
The region-based Convolutional Neural Network (CNN) detectors such as Fa...
Image matting plays an important role in image and video editing. Howeve...
Foreground segmentation in video sequences is a classic topic in compute...