CNN-based Facial Affect Analysis on Mobile Devices
This paper focuses on the design, deployment and evaluation of Convolutional Neural Network (CNN) architectures for facial affect analysis on mobile devices. Unlike traditional CNN approaches, models deployed to mobile devices must minimise storage requirements while retaining high performance. We therefore propose three variants of established CNN architectures and comparatively evaluate them on a large, in-the-wild benchmark dataset of facial images. Our results show that the proposed architectures retain similar performance to the dataset baseline while minimising storage requirements: achieving 58 of 0.39 for valence/arousal prediction. To demonstrate the feasibility of deploying these models for real-world applications, we implement a music recommendation interface based on predicted user affect. Although the CNN models were not trained in the context of music recommendation, our case study shows that: (i) the trained models achieve similar prediction performance to the benchmark dataset, and (ii) users tend to positively rate the song recommendations provided by the interface. Average runtime of the deployed models on an iPhone 6S equates to 45 fps, suggesting that the proposed architectures are also well suited for real-time deployment on video streams.
READ FULL TEXT