Who's Better, Who's Best: Skill Determination in Video using Deep Ranking
This paper presents a method for assessing skill of performance from video, for a variety of tasks, ranging from drawing to surgery and rolling dough. We formulate the problem as pairwise and overall ranking of video collections, and propose a supervised deep ranking model to learn discriminative features between pairs of videos exhibiting different amounts of skill. We utilise a two-stream Temporal Segment Network to capture both the type and quality of motions and the evolving task state. Results demonstrate our method is applicable to a variety of tasks, with the percentage of correctly ordered pairs of videos ranging from 70 robustness of our approach via sensitivity analysis of its parameters. We see this work as effort toward the automated and objective organisation of how-to videos and overall, generic skill determination in video.
READ FULL TEXT