ترغب بنشر مسار تعليمي؟ اضغط هنا

Crowd Counting by Self-supervised Transfer Colorization Learning and Global Prior Classification

205   0   0.0 ( 0 )
 نشر من قبل Haoyue Bai
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Labeled crowd scene images are expensive and scarce. To significantly reduce the requirement of the labeled images, we propose ColorCount, a novel CNN-based approach by combining self-supervised transfer colorization learning and global prior classification to leverage the abundantly available unlabeled data. The self-supervised colorization branch learns the semantics and surface texture of the image by using its color components as pseudo labels. The classification branch extracts global group priors by learning correlations among image clusters. Their fused resultant discriminative features (global priors, semantics and textures) provide ample priors for counting, hence significantly reducing the requirement of labeled images. We conduct extensive experiments on four challenging benchmarks. ColorCount achieves much better performance as compared with other unsupervised approaches. Its performance is close to the supervised baseline with substantially less labeled data (10% of the original one).



قيم البحث

اقرأ أيضاً

316 - Yan Liu , Lingqiao Liu , Peng Wang 2020
Most existing crowd counting systems rely on the availability of the object location annotation which can be expensive to obtain. To reduce the annotation cost, one attractive solution is to leverage a large number of unlabeled images to build a crow d counting model in semi-supervised fashion. This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning. Our key idea is to leverage the unlabeled images to train a generic feature extractor rather than the entire network of a crowd counter. The rationale of this design is that learning the feature extractor can be more reliable and robust towards the inevitable noisy supervision generated from the unlabeled data. Also, on top of a good feature extractor, it is possible to build a density map regressor with much fewer density map annotations. Specifically, we proposed a novel semi-supervised crowd counting method which is built upon two innovative components: (1) a set of inter-related binary segmentation tasks are derived from the original density map regression task as the surrogate prediction target; (2) the surrogate target predictors are learned from both labeled and unlabeled data by utilizing a proposed self-training scheme which fully exploits the underlying constraints of these binary segmentation tasks. Through experiments, we show that the proposed method is superior over the existing semisupervised crowd counting method and other representative baselines.
The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the tes ting phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count level annotations, a more economical way of labeling. Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm. However, having limited receptive fields for context modeling is an intrinsic limitation of these weakly-supervised CNN-based methods. These methods thus can not achieve satisfactory performance, limited applications in the real-word. The Transformer is a popular sequence-to-sequence prediction model in NLP, which contains a global receptive field. In this paper, we propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on Transformer. We observe that the proposed TransCrowd can effectively extract the semantic crowd information by using the self-attention mechanism of Transformer. To the best of our knowledge, this is the first work to adopt a pure Transformer for crowd counting research. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods and gains highly competitive counting performance compared with some popular fully-supervised counting methods. Code is available at https://github.com/dk-liang/TransCrowd.
Semi-supervised approaches for crowd counting attract attention, as the fully supervised paradigm is expensive and laborious due to its request for a large number of images of dense crowd scenarios and their annotations. This paper proposes a spatial uncertainty-aware semi-supervised approach via regularized surrogate task (binary segmentation) for crowd counting problems. Different from existing semi-supervised learning-based crowd counting methods, to exploit the unlabeled data, our proposed spatial uncertainty-aware teacher-student framework focuses on high confident regions information while addressing the noisy supervision from the unlabeled data in an end-to-end manner. Specifically, we estimate the spatial uncertainty maps from the teacher models surrogate task to guide the feature learning of the main task (density regression) and the surrogate task of the student model at the same time. Besides, we introduce a simple yet effective differential transformation layer to enforce the inherent spatial consistency regularization between the main task and the surrogate task in the student model, which helps the surrogate task to yield more reliable predictions and generates high-quality uncertainty maps. Thus, our model can also address the task-level perturbation problems that occur spatial inconsistency between the primary and surrogate tasks in the student model. Experimental results on four challenging crowd counting datasets demonstrate that our method achieves superior performance to the state-of-the-art semi-supervised methods.
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. However, most previous works relied on heavy backbone networks and required prohibitive run-time consumption, which would seriously re strict their deployment scopes and cause poor scalability. To liberate these crowd counting models, we propose a novel Structured Knowledge Transfer (SKT) framework, which fully exploits the structured knowledge of a well-trained teacher network to generate a lightweight but still highly effective student network. Specifically, it is integrated with two complementary transfer modules, including an Intra-Layer Pattern Transfer which sequentially distills the knowledge embedded in layer-wise features of the teacher network to guide feature learning of the student network and an Inter-Layer Relation Transfer which densely distills the cross-layer correlation knowledge of the teacher to regularize the students feature evolutio Consequently, our student network can derive the layer-wise and cross-layer knowledge from the teacher network to learn compact yet effective features. Extensive evaluations on three benchmarks well demonstrate the effectiveness of our SKT for extensive crowd counting models. In particular, only using around $6%$ of the parameters and computation cost of original models, our distilled VGG-based models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance. Our code and models are available at {url{https://github.com/HCPLab-SYSU/SKT}}.
Generalising deep networks to novel domains without manual labels is challenging to deep learning. This problem is intrinsically difficult due to unpredictable changing nature of imagery data distributions in novel domains. Pre-learned knowledge does not transfer well without making strong assumptions about the learned and the novel domains. Different methods have been studied to address the underlying problem based on different assumptions, e.g. from domain adaptation to zero-shot and few-shot learning. In this work, we address this problem by transfer clustering that aims to learn a discriminative latent space of the unlabelled target data in a novel domain by knowledge transfer from labelled related domains. Specifically, we want to leverage relative (pairwise) imagery information, which is freely available and intrinsic to a target domain, to model the target domain image distribution characteristics as well as the prior-knowledge learned from related labelled domains to enable more discriminative clustering of unlabelled target data. Our method mitigates nontransferrable prior-knowledge by self-supervision, benefiting from both transfer and self-supervised learning. Extensive experiments on four datasets for image clustering tasks reveal the superiority of our model over the state-of-the-art transfer clustering techniques. We further demonstrate its competitive transferability on four zero-shot learning benchmarks.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا