Query-Adaptive R-CNN for Open-Vocabulary Object Detection and Retrieval

11/27/2017
by   Ryota Hinami, et al.
0

We address the problem of open-vocabulary object retrieval and localization, which is to retrieve and localize objects from a very large-scale image database immediately by a textual query (e.g., a word or phrase). We first propose Query-Adaptive R-CNN, a simple yet strong framework for open-vocabulary object detection. Query-Adaptive R-CNN is a simple extension of Faster R-CNN from closed-vocabulary to open-vocabulary object detection: instead of learning a class-specific classifier and regressor, we learn a detector generator that transforms a text into classifier and regressor weights. All of its components can be learned in an end-to-end manner. Even with its simple architecture, it outperforms all state-of-the-art methods in the Flickr30k Entities phrase localization task. In addition, we propose negative phrase augmentation, a generic approach for exploiting hard negatives in the training of open-vocabulary object detection that significantly improves the discriminative ability of the generated classifier. We show that our system can retrieve and localize objects specified by a textual query from one million images in only 0.5 seconds.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro