Fast Graph Sampling for Short Video Summarization using Gershgorin Disc Alignment
We study the problem of efficiently summarizing a short video into several keyframes, leveraging recent progress in fast graph sampling. Specifically, we first construct a similarity path graph (SPG) π’, represented by graph Laplacian matrix π, where the similarities between adjacent frames are encoded as positive edge weights. We show that maximizing the smallest eigenvalue Ξ»_min(π) of a coefficient matrix π = diag(π) + ΞΌπ, where π is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We prove that, after partitioning π’ into Q sub-graphs {π’^q}^Q_q=1, the smallest Gershgorin circle theorem (GCT) lower bound of Q corresponding coefficient matrices β min_q Ξ»^-_min(π^q) β is a lower bound for Ξ»_min(π). This inspires a fast graph sampling algorithm to iteratively partition π’ into Q sub-graphs using Q samples (keyframes), while maximizing Ξ»^-_min(π^q) for each sub-graph π’^q. Experimental results show that our algorithm achieves comparable video summarization performance as state-of-the-art methods, at a substantially reduced complexity.
READ FULL TEXT