Weidi Xie

research

∙ 09/20/2023

A Large-scale Dataset for Audio-Language Representation Learning

The AI community has made significant strides in developing powerful fou...

0 Luoyi Sun, et al. ∙

research

∙ 09/13/2023

UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training

Magnetic resonance imaging (MRI) have played a crucial role in brain dis...

0 Jiayu Lei, et al. ∙

research

∙ 09/07/2023

The Making and Breaking of Camouflage

Not all camouflages are equally effective, as even a partially visible c...

0 Hala Lamdouar, et al. ∙

research

∙ 08/16/2023

Diagnosing Human-object Interaction Detectors

Although we have witnessed significant progress in human-object interact...

0 Fangrui Zhu, et al. ∙

research

∙ 08/09/2023

Joint-Relation Transformer for Multi-Person Motion Prediction

Multi-person motion prediction is a challenging problem due to the depen...

0 Qingyao Xu, et al. ∙

research

∙ 08/04/2023

Towards Generalist Foundation Model for Radiology

In this study, we aim to initiate the development of Radiology Foundatio...

0 Chaoyi Wu, et al. ∙

research

∙ 06/24/2023

Boost Video Frame Interpolation via Motion Adaptation

Video frame interpolation (VFI) is a challenging task that aims to gener...

0 Haoning Wu, et al. ∙

research

∙ 06/13/2023

arXiVeri: Automatic table verification with GPT

Without accurate transcription of numerical data in scientific documents...

3 Gyungin Shin, et al. ∙

research

∙ 06/12/2023

Zero-shot Composed Text-Image Retrieval

In this paper, we consider the problem of composed image retrieval (CIR)...

2 Yikun Liu, et al. ∙

research

∙ 06/08/2023

Multi-Modal Classifiers for Open-Vocabulary Object Detection

The goal of this paper is open-vocabulary object detection (OVOD) x2013 ...

9 Prannay Kaul, et al. ∙

research

∙ 06/01/2023

Intelligent Grimm – Open-ended Visual Storytelling via Latent Diffusion Models

Generative models have recently exhibited exceptional capabilities in va...

12 Chang Liu, et al. ∙

research

∙ 05/18/2023

Annotation-free Audio-Visual Segmentation

The objective of Audio-Visual Segmentation (AVS) is to localise the soun...

12 Jinxiang Liu, et al. ∙

research

∙ 05/17/2023

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

In this paper, we focus on the problem of Medical Visual Question Answer...

10 Xiaoman Zhang, et al. ∙

research

∙ 04/27/2023

PMC-LLaMA: Further Finetuning LLaMA on Medical Papers

Large Language Models (LLMs) have showcased remarkable capabilities in n...

13 Chaoyi Wu, et al. ∙

research

∙ 04/27/2023

Zero-shot Unsupervised Transfer Instance Segmentation

Segmentation is a core computer vision competency, with applications spa...

2 Gyungin Shin, et al. ∙

research

∙ 04/04/2023

Towards Open-Vocabulary Video Instance Segmentation

Video Instance Segmentation(VIS) aims at segmenting and categorizing obj...

2 Haochen Wang, et al. ∙

research

∙ 03/29/2023

AutoAD: Movie Description in Context

The objective of this paper is an automatic Audio Description (AD) model...

14 Tengda Han, et al. ∙

research

∙ 03/23/2023

Collaboration Helps Camera Overtake LiDAR in 3D Detection

Camera-only 3D detection provides an economical solution with a simple c...

7 Yue Hu, et al. ∙

research

∙ 03/21/2023

Multi-modal Prompting for Low-Shot Temporal Action Localization

In this paper, we consider the problem of temporal action localization u...

1 Chen Ju, et al. ∙

research

∙ 03/13/2023

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

Foundation models trained on large-scale dataset gain a recent surge in ...

3 Weixiong Lin, et al. ∙

research

∙ 02/27/2023

Knowledge-enhanced Pre-training for Auto-diagnosis of Chest Radiology Images

Despite of the success of multi-modal foundation models pre-trained on l...

9 Xiaoman Zhang, et al. ∙

research

∙ 02/22/2023

K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging

In this paper, we consider the problem of disease diagnosis. Unlike the ...

2 Chaoyi Wu, et al. ∙

research

∙ 01/23/2023

OvarNet: Towards Open-vocabulary Object Attribute Recognition

In this paper, we consider the problem of simultaneously detecting objec...

6 Keyan Chen, et al. ∙

research

∙ 01/22/2023

Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

In this paper, we consider the problem of open-vocabulary semantic segme...

7 Jilan Xu, et al. ∙

research

∙ 01/12/2023

Guiding Text-to-Image Diffusion Model Towards Grounded Generation

The goal of this paper is to augment a pre-trained text-to-image diffusi...

0 Ziyi Li, et al. ∙

research

∙ 01/05/2023

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training

In this paper, we consider the problem of enhancing self-supervised visu...

38 Chaoyi Wu, et al. ∙

research

∙ 10/27/2022

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

When trained at a sufficient scale, self-supervised learning has exhibit...

8 Chaofan Ma, et al. ∙

research

∙ 10/18/2022

A Tri-Layer Plugin to Improve Occluded Detection

Detecting occluded objects still remains a challenge for state-of-the-ar...

3 Guanqi Zhan, et al. ∙

research

∙ 10/13/2022

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors

The objective of this paper is audio-visual synchronisation of general v...

21 Vladimir Iashin, et al. ∙

research

∙ 10/10/2022

Turbo Training with Token Dropout

The objective of this paper is an efficient training method for video ta...

8 Tengda Han, et al. ∙

research

∙ 10/07/2022

A Simple Plugin for Transforming Images to Arbitrary Scales

Existing models on super-resolution often specialized for one scale, fun...

7 Qinye Zhou, et al. ∙

research

∙ 10/01/2022

Motion-inductive Self-supervised Object Discovery in Videos

In this paper, we consider the task of unsupervised object discovery in ...

7 Shuangrui Ding, et al. ∙

research

∙ 09/22/2022

NamedMask: Distilling Segmenters from Complementary Foundation Models

The goal of this work is to segment and name regions of images without a...

0 Gyungin Shin, et al. ∙

research

∙ 09/12/2022

Adaptive 3D Localization of 2D Freehand Ultrasound Brain Images

Two-dimensional (2D) freehand ultrasound is the mainstay in prenatal car...

7 Pak Hei Yeung, et al. ∙

research

∙ 08/29/2022

CounTR: Transformer-based Generalised Visual Counting

In this paper, we consider the problem of generalised visual object coun...

2 Chang Liu, et al. ∙

research

∙ 08/20/2022

Transforming the Interactive Segmentation for Medical Imaging

The goal of this paper is to interactively refine the automatic segmenta...

12 Wentao Liu, et al. ∙

research

∙ 08/08/2022

Aerial Monocular 3D Object Detection

Drones equipped with cameras can significantly enhance human ability to ...

2 Yue Hu, et al. ∙

research

∙ 07/05/2022

Segmenting Moving Objects via an Object-Centric Layered Representation

The objective of this paper is a model that is able to discover, track a...

1 Junyu Xie, et al. ∙

research

∙ 06/26/2022

Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

We present a simple yet effective self-supervised framework for audio-vi...

6 Jinxiang Liu, et al. ∙

research

∙ 06/14/2022

ReCo: Retrieve and Co-segment for Zero-shot Transfer

Semantic segmentation has a broad range of applications, but its real-wo...

1 Gyungin Shin, et al. ∙

research

∙ 06/14/2022

K-Space Transformer for Fast MRI Reconstruction with Implicit Representation

This paper considers the problem of fast MRI reconstruction. We propose ...

4 Ziheng Zhao, et al. ∙

research

∙ 04/06/2022

Temporal Alignment Networks for Long-term Video

The objective of this paper is a temporal alignment network that ingests...

0 Tengda Han, et al. ∙

research

∙ 03/23/2022

Unsupervised Salient Object Detection with Spectral Cluster Voting

In this paper, we tackle the challenging task of unsupervised salient ob...

21 Gyungin Shin, et al. ∙

research

∙ 12/10/2021

Label, Verify, Correct: A Simple Few Shot Object Detection Method

The objective of this paper is few-shot object detection (FSOD) – the ta...

2 Prannay Kaul, et al. ∙

research

∙ 12/08/2021

Prompting Visual-Language Models for Efficient Video Understanding

Visual-language pre-training has shown great success for learning joint ...

0 Chen Ju, et al. ∙

research

∙ 12/08/2021

Audio-Visual Synchronisation in the wild

In this paper, we consider the problem of audio-visual synchronisation a...

2 Honglie Chen, et al. ∙

research

∙ 11/17/2021

It's About Time: Analog Clock Reading in the Wild

In this paper, we present a framework for reading analog clocks in natur...

2 Charig Yang, et al. ∙

research

∙ 09/24/2021

ImplicitVol: Sensorless 3D Ultrasound Reconstruction with Deep Implicit Representation

The objective of this work is to achieve sensorless reconstruction of a ...

14 Pak Hei Yeung, et al. ∙

research

∙ 09/07/2021

Self-supervised Tumor Segmentation through Layer Decomposition

In this paper, we propose a self-supervised approach for tumor segmentat...

10 Xiaoman Zhang, et al. ∙

research

∙ 05/26/2021

Sli2Vol: Annotate a 3D Volume from a Single Slice with Self-Supervised Learning

The objective of this work is to segment any arbitrary structures of int...

11 Pak Hei Yeung, et al. ∙

Weidi Xie

Featured Co-authors

Sign in with Google

Consider DeepAI Pro