ترغب بنشر مسار تعليمي؟ اضغط هنا

114 - Liang Peng , Fei Liu , Zhengxu Yu 2021
Monocular 3D detection currently struggles with extremely lower detection rates compared to LiDAR-based methods. The poor accuracy is mainly caused by the absence of accurate location cues due to the ill-posed nature of monocular imagery. LiDAR point clouds, which provide precise spatial measurement, can offer beneficial information for the training of monocular methods. To make use of LiDAR point clouds, prior works project them to form depth map labels, subsequently training a dense depth estimator to extract explicit location features. This indirect and complicated way introduces intermediate products, i.e., depth map predictions, taking much computation costs as well as leading to suboptimal performances. In this paper, we propose LPCG (LiDAR point cloud guided monocular 3D object detection), which is a general framework for guiding the training of monocular 3D detectors with LiDAR point clouds. Specifically, we use LiDAR point clouds to generate pseudo labels, allowing monocular 3D detectors to benefit from easy-collected massive unlabeled data. LPCG works well under both supervised and unsupervised setups. Thanks to a general design, LPCG can be plugged into any monocular 3D detector, significantly boosting the performance. As a result, we take the first place on KITTI monocular 3D/BEV (birds-eye-view) detection benchmark with a considerable margin. The code will be made publicly available soon.
With the rise of deep learning methods, person Re-Identification (ReID) performance has been improved tremendously in many public datasets. However, most public ReID datasets are collected in a short time window in which persons appearance rarely cha nges. In real-world applications such as in a shopping mall, the same persons clothing may change, and different persons may wearing similar clothes. All these cases can result in an inconsistent ReID performance, revealing a critical problem that current ReID models heavily rely on persons apparels. Therefore, it is critical to learn an apparel-invariant person representation under cases like cloth changing or several persons wearing similar clothes. In this work, we tackle this problem from the viewpoint of invariant feature representation learning. The main contributions of this work are as follows. (1) We propose the semi-supervised Apparel-invariant Feature Learning (AIFL) framework to learn an apparel-invariant pedestrian representation using images of the same person wearing different clothes. (2) To obtain images of the same person wearing different clothes, we propose an unsupervised apparel-simulation GAN (AS-GAN) to synthesize cloth changing images according to the target cloth embedding. Its worth noting that the images used in ReID tasks were cropped from real-world low-quality CCTV videos, making it more challenging to synthesize cloth changing images. We conduct extensive experiments on several datasets comparing with several baselines. Experimental results demonstrate that our proposal can improve the ReID performance of the baseline models.
Model fine-tuning is a widely used transfer learning approach in person Re-identification (ReID) applications, which fine-tuning a pre-trained feature extraction model into the target scenario instead of training a model from scratch. It is challengi ng due to the significant variations inside the target scenario, e.g., different camera viewpoint, illumination changes, and occlusion. These variations result in a gap between the distribution of each mini-batch and the whole datasets distribution when using mini-batch training. In this paper, we study model fine-tuning from the perspective of the aggregation and utilization of the global information of the dataset when using mini-batch training. Specifically, we introduce a novel network structure called Batch-related Convolutional Cell (BConv-Cell), which progressively collects the global information of the dataset into a latent state and uses it to rectify the extracted feature. Based on BConv-Cells, we further proposed the Progressive Transfer Learning (PTL) method to facilitate the model fine-tuning process by jointly optimizing the BConv-Cells and the pre-trained ReID model. Empirical experiments show that our proposal can improve the performance of the ReID model greatly on MSMT17, Market-1501, CUHK03 and DukeMTMC-reID datasets. Moreover, we extend our proposal to the general image classification task. The experiments in several image classification benchmark datasets demonstrate that our proposal can significantly improve the performance of baseline models. The code has been released at url{https://github.com/ZJULearning/PTL}
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا