How to Train Your Dragon: Tamed Warping Network for Semantic Video Segmentation

05/04/2020
by   Junyi Feng, et al.
2

Real-time semantic segmentation on high-resolution videos is challenging due to the strict requirements of speed. Recent approaches have utilized the inter-frame continuity to reduce redundant computation by warping the feature maps across adjacent frames, greatly speeding up the inference phase. However, their accuracy drops significantly owing to the imprecise motion estimation and error accumulation. In this paper, we propose to introduce a simple and effective correction stage right after the warping stage to form a framework named Tamed Warping Network (TWNet), aiming to improve the accuracy and robustness of warping-based models. The experimental results on the Cityscapes dataset show that with the correction, the accuracy (mIoU) significantly increases from 67.3 FPS. For non-rigid categories such as "human" and "object", the improvements of IoU are even higher than 18 percentage points.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset