David A. Ross

research

∙ 06/13/2023

AVIS: Autonomous Visual Information Seeking with Large Language Models

In this paper, we propose an autonomous information seeking visual quest...

0 Ziniu Hu, et al. ∙

research

∙ 06/02/2023

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Observing the close relationship among panoptic, semantic and instance s...

0 Xiuye Gu, et al. ∙

research

∙ 02/02/2023

IC^3: Image Captioning by Committee Consensus

If you ask a human to describe an image, they might do so in a thousand ...

0 David M. Chan, et al. ∙

research

∙ 12/20/2022

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

Detecting actions in untrimmed videos should not be limited to a small, ...

0 Vivek Rathod, et al. ∙

research

∙ 12/10/2022

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

In this paper, we propose an end-to-end Retrieval-Augmented Visual Langu...

0 Ziniu Hu, et al. ∙

research

∙ 09/15/2022

Distribution Aware Metrics for Conditional Natural Language Generation

Traditional automated metrics for evaluating conditional natural languag...

12 David M. Chan, et al. ∙

research

∙ 05/12/2022

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

While there have been significant gains in the field of automated video ...

6 David M. Chan, et al. ∙

research

∙ 01/21/2021

Learn to Dance with AIST++: Music Conditioned 3D Dance Generation

In this paper, we present a transformer-based learning framework for 3D ...

0 Ruilong Li, et al. ∙

research

∙ 07/29/2020

Learning Video Representations from Textual Web Supervision

Videos found on the Internet are paired with pieces of text, such as tit...

5 Jonathan C. Stroud, et al. ∙

research

∙ 07/27/2020

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

Automatic video captioning aims to train models to generate text descrip...

11 David M. Chan, et al. ∙

research

∙ 07/24/2020

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

Detecting objects in 3D LiDAR data is a core technology for autonomous d...

35 Rui Huang, et al. ∙

research

∙ 05/01/2020

The AVA-Kinetics Localized Human Actions Video Dataset

This paper describes the AVA-Kinetics localized human actions video data...

6 Ang Li, et al. ∙

research

∙ 12/19/2018

D3D: Distilled 3D Networks for Video Action Recognition

State-of-the-art methods for video action recognition commonly use an en...

10 Jonathan C. Stroud, et al. ∙

research

∙ 04/20/2018

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

We propose TAL-Net, an improved approach to temporal action localization...

0 Yu-Wei Chao, et al. ∙

research

∙ 05/23/2017

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

This paper introduces a video dataset of spatio-temporally localized Ato...

0 Chunhui Gu, et al. ∙

David A. Ross

Featured Co-authors

Sign in with Google

Consider DeepAI Pro