Composition-preserving Deep Photo Aesthetics Assessment

Long Mai1, Hailin Jin2, and Feng Liu1

1Computer Science Department, Portland State University
 2Adobe Systems Inc.

Abstract
Photo aesthetics assessment is challenging. Deep convolutional neural network (ConvNet) methods have recently shown promising results for aesthetics assessment. The performance of these deep ConvNet methods, however, is often compromised by the constraint that the neural network only takes the fixed-size input. To accommodate this requirement, input images need to be transformed via cropping, scaling, or padding, which often damages image composition, reduces image resolution, or causes image distortion, thus compromising the aesthetics of the original images. In this paper, we present a composition-preserving deep ConvNet method that directly learns aesthetics features from the original input images without any image transformations. Specifically, our method adds an adaptive spatial pooling layer upon the regular convolution and pooling layers to directly handle input images with original sizes and aspect ratios. To allow for multi-scale feature extraction, we develop the Multi-Net Adaptive Spatial Pooling ConvNet architecture which consists of multiple sub-networks with different adaptive spatial pooling sizes and leverage a scene-based aggregation layer to effectively combine the predictions from multiple sub-networks. Our experiments on the large-scale aesthetics assessment benchmark (AVA) demonstrate that our method can significantly improve the state-of-the-art results in photo aesthetics assessment.
Paper
Long Mai, Hailin Jin, and Feng Liu. Composition-preserving Deep Photo Aesthetics Assessment.
IEEE CVPR 2016, Las Vegas, NV, USA, June 2016. PDF
Related Paper
Long Mai, Hoang Le, Yuzhen Niu, Yu-chi Lai, and Feng Liu. Detecting Rule of Simplicity from Photos.
ACM Multimedia 2012, Nara, Japan, October 2012. (short paper) PDF Website

Yuzhen Niu and Feng Liu. What Makes a Professional Video? A Computational Aesthetics Approach. IEEE Transactions on Circuits and Systems for Video Technology. Vol. 22, Issue 7, 2012: 1037 - 1049. PDF

Long Mai, Hoang Le, Yuzhen Niu, and Feng Liu. Rule of Thirds Detection from Photograph.
IEEE ISM 2011, Dana Point, CA, USA, December 2011. PDF Website
Code
We provide the code and demo files here. If you use our code, please cite our paper [1]. This package also includes the model files from the scene recoginition paper [2].
[1] Long Mai, Hailin Jin, and Feng Liu. Composition-preserving Deep Photo Aesthetics Assessment.
IEEE CVPR 2016, Las Vegas, NV, USA, June 2016. PDF

[2] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning Deep Features for Scene Recognition using Places Database. NIPS 2014.