We aim to identify the salient objects in an image by applying a model of visual attention. We automate the process by predicting those objects in an image that are most likely to be the focus of someone's visual attention. Concretely, we first generate fixation maps from the eye tracking
data, which express the ground truth of people's visual attention for each training image. Then, we extract the high-level features based on the bag-of-visual-words image representation as input attributes along with the fixation maps to train a support vector regression model. With this model,
we can predict a new query image's saliency. Our experiments show that the model is capable of providing a good estimate for human visual attention in test images sets with one salient object and multiple salient objects. In this way, we seek to reduce the redundant information within the
scene, and thus provide a more accurate depiction of the scene.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media
bag of visual words;
support vector regression;
Document Type: Research Article
School of Mechanical Engineering, Northwestern Polytechnical University, Xi'an, P.R. China
Centre for Computational Intelligence, De Montfort University, Leicester, UK
School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, P.R. China
Publication date: May 3, 2016
More about this publication?