In this paper, we propose a novel cross-modal distillation method, calle...
Deep Neural Networks can be easily fooled by small and imperceptible
per...
Recent years have witnessed the great success of vision transformer (ViT...
We present InstructDiffusion, a unifying and generic framework for align...
In this paper, we propose a robust aggregation method for federated lear...
The recent wave of AI-generated content has witnessed the great developm...
Over-the-air computation is a communication-efficient solution for feder...
Federated learning (FL) has emerged as an appealing machine learning app...
Weakly supervised object localization (WSOL) is one of the most popular ...
Federated Learning aims to learn a global model on the server side that
...
While language-guided image manipulation has made remarkable progress, t...
Recently, the efficient deployment and acceleration of powerful vision
t...
With the popularity of smartphones and tablets, users have become accust...
With the increasing interconnection of smart devices, users often desire...
Powered by the rising popularity of deep learning techniques on smartpho...
The lack of façade structures in photogrammetric mesh models renders the...
The tremendous success of large models trained on extensive datasets
dem...
This paper studies multiparty learning, aiming to learn a model using th...
Vision transformers have shown great success due to their high model
cap...
In this paper, we present a new sequence-to-sequence learning framework ...
Automatic and periodic recompiling of building databases with up-to-date...
Most urban applications necessitate building footprints in the form of
c...
The accurate representation of 3D building models in urban environments ...
Human pose is typically represented by a coordinate vector of body joint...
The ability to classify images accurately and efficiently is dependent o...
Denoising diffusion models have been a mainstream approach for image
gen...
Deep supervision, which involves extra supervisions to the intermediate
...
Lung cancer is the leading cause of cancer death worldwide. The best sol...
Federated learning (FL) enables multiple clients to train a machine lear...
This paper presents a new framework for open-vocabulary semantic segment...
Federated learning aims to collaboratively train models without accessin...
Large-scale language models have achieved tremendous success across vari...
Unlike language tasks, where the output space is usually limited to a se...
Masked image modeling (MIM) performs strongly in pre-training large visi...
This work proposes a framework developed to generalize Critical Heat Flu...
Image token removal is an efficient augmentation strategy for reducing t...
Vision Transformers (ViTs) have achieved overwhelming success, yet they
...
Semi-supervised action recognition is a challenging but critical task du...
The image captioning task is typically realized by an auto-regressive me...
Frozen pretrained models have become a viable alternative to the
pretrai...
Semantic communication in the 6G era has been deemed a promising
communi...
Few-shot visual recognition refers to recognize novel visual concepts fr...
Few-shot part segmentation aims to separate different parts of an object...
In this paper, we show that the process of continually learning new task...
An important goal of self-supervised learning is to enable model pre-tra...
Masked image modeling (MIM) learns representations with remarkably good
...
Masked image modeling (MIM) as pre-training is shown to be effective for...
Recently, deploying deep neural network (DNN) models via collaborative
i...
Recently, the compression and deployment of powerful deep neural network...
Recent literature have shown design strategies from Convolutions Neural
...