Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization


الملخص بالإنكليزية

Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method based on attention (VALA), which relies on the strong relevance between attributes and views to capture specific view-attributes and to localize attribute-corresponding areas by attention mechanism. A specific view-attribute is composed by the extracted attribute feature and four view scores which are predicted by view predictor as the confidences for attribute from different views. View-attribute is then delivered back to shallow network layers for supervising deep feature extraction. To explore the location of a view-attribute, regional attention is introduced to aggregate spatial information of the input attribute feature in height and width direction for constraining the image into a narrow range. Moreover, the inter-channel dependency of view-feature is embedded in the above two spatial directions. An attention attribute-specific region is gained after fining the narrow range by balancing the ratio of channel dependencies between height and width branches. The final view-attribute recognition outcome is obtained by combining the output of regional attention with the view scores from view predictor. Experiments on three wide datasets (RAP, RAPv2, PETA, and PA-100K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.

تحميل البحث