Learning of Proto-object Representations via Fixations on Low Resolution
While previous researches in eye fixation prediction typically rely on integrating low-level features (e.g. color, edge) to form a saliency map, recently it has been found that the structural organization of these features into a proto-object representation can play a more significant role. In this work, we present a computational framework based on deep network to demonstrate that proto-object representations can be learned from low-resolution image patches from fixation regions. We advocate the use of low-resolution inputs in this work due to the following reasons: (1) Proto-objects are computed in parallel over an entire visual field (2) People can perceive or recognize objects well even it is in low resolution. (3) Fixations from lower resolution images can predict fixations on higher resolution images. In the proposed computational model, we extract multi-scale image patches on fixation regions from eye fixation datasets, resize them to low resolution and feed them into a hierarchical. With layer-wise unsupervised feature learning, we find that many proto-objects like features responsive to different shapes of object blobs are learned out. Visualizations also show that these features are selective to potential objects in the scene and the responses of these features work well in predicting eye fixations on the images when combined with learned weights.
READ FULL TEXT