أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Guosheng Lin

Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning

189 - Chi Zhang , Henghui Ding , Guosheng Lin 2021

Few-shot learning aims to adapt knowledge learned from previous tasks to novel tasks with only a limited amount of labeled data. Research literature on few-shot learning exhibits great diversity, while different algorithms often excel at different fe w-shot learning scenarios. It is therefore tricky to decide which learning strategies to use under different task conditions. Inspired by the recent success in Automated Machine Learning literature (AutoML), in this paper, we present Meta Navigator, a framework that attempts to solve the aforementioned limitation in few-shot learning by seeking a higher-level strategy and proffer to automate the selection from various few-shot learning designs. The goal of our work is to search for good parameter adaptation policies that are applied to different stages in the network for few-shot classification. We present a search space that covers many popular few-shot learning algorithms in the literature and develop a differentiable searching and decoding algorithm based on meta-learning that supports gradient-based optimization. We demonstrate the effectiveness of our searching-based method on multiple benchmark datasets. Extensive experiments show that our approach significantly outperforms baselines and demonstrates performance advantages over many state-of-the-art methods. Code and models will be made publicly available.

الرؤية الحاسوبية وتمييز الأنماط

Calibrating Class Activation Maps for Long-Tailed Visual Recognition

141 - Chi Zhang , Guosheng Lin , Lvlong Lai 2021

Real-world visual recognition problems often exhibit long-tailed distributions, where the amount of data for learning in different categories shows significant imbalance. Standard classification models learned on such data distribution often make bia sed predictions towards the head classes while generalizing poorly to the tail classes. In this paper, we present two effective modifications of CNNs to improve network learning from long-tailed distribution. First, we present a Class Activation Map Calibration (CAMC) module to improve the learning and prediction of network classifiers, by enforcing network prediction based on important image regions. The proposed CAMC module highlights the correlated image regions across data and reinforces the representations in these areas to obtain a better global representation for classification. Furthermore, we investigate the use of normalized classifiers for representation learning in long-tailed problems. Our empirical study demonstrates that by simply scaling the outputs of the classifier with an appropriate scalar, we can effectively improve the classification accuracy on tail classes without losing the accuracy of head classes. We conduct extensive experiments to validate the effectiveness of our design and we set new state-of-the-art performance on five benchmarks, including ImageNet-LT, Places-LT, iNaturalist 2018, CIFAR10-LT, and CIFAR100-LT.

الرؤية الحاسوبية وتمييز الأنماط

Cross-Modal Graph with Meta Concepts for Video Captioning

348 - Hao Wang , Guosheng Lin , Steven C. H. Hoi 2021

Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networ ks to give object proposals and use the attention mechanism to model the relations between objects. They often miss some undefined semantic concepts of the pretrained model and fail to identify exact predicate relationships between objects. In this paper, we investigate an open research task of generating text descriptions for the given videos, and propose Cross-Modal Graph (CMG) with meta concepts for video captioning. Specifically, to cover the useful semantic concepts in video captions, we weakly learn the corresponding visual regions for text descriptions, where the associated visual regions and textual words are named cross-modal meta concepts. We further build meta concept graphs dynamically with the learned cross-modal meta concepts. We also construct holistic video-level and local frame-level video graphs with the predicted predicates to model video sequence structures. We validate the efficacy of our proposed techniques with extensive experiments and achieve state-of-the-art results on two public datasets.

الرؤية الحاسوبية وتمييز الأنماط

Learning Meta-class Memory for Few-Shot Semantic Segmentation

251 - Zhonghua Wu , Xiangxi Shi , Guosheng lin 2021

Currently, the state-of-the-art methods treat few-shot semantic segmentation task as a conditional foreground-background segmentation problem, assuming each class is independent. In this paper, we introduce the concept of meta-class, which is the met a information (e.g. certain middle-level features) shareable among all classes. To explicitly learn meta-class representations in few-shot segmentation task, we propose a novel Meta-class Memory based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings to memorize the meta-class information during the base class training and transfer to novel classes during the inference stage. Moreover, for the $k$-shot scenario, we propose a novel image quality measurement module to select images from the set of support images. A high-quality class prototype could be obtained with the weighted sum of support image features based on the quality measure. Experiments on both PASCAL-$5^i$ and COCO dataset shows that our proposed method is able to achieve state-of-the-art results in both 1-shot and 5-shot settings. Particularly, our proposed MM-Net achieves 37.5% mIoU on the COCO dataset in 1-shot setting, which is 5.1% higher than the previous state-of-the-art.

الرؤية الحاسوبية وتمييز الأنماط

Point Discriminative Learning for Unsupervised Representation Learning on 3D Point Clouds

131 - Fayao Liu , Guosheng Lin , Chuan-Sheng Foo 2021

Recently deep learning has achieved significant progress on point cloud analysis tasks. Learning good representations is of vital importance to these tasks. Most current methods rely on massive labelled data for training. We here propose a point disc riminative learning method for unsupervised representation learning on 3D point clouds, which can learn local and global geometry features. We achieve this by imposing a novel point discrimination loss on the middle level and global level point features produced in the backbone network. This point discrimination loss enforces the features to be consistent with points belonging to the shape surface and inconsistent with randomly sampled noisy points. Our method is simple in design, which works by adding an extra adaptation module and a point consistency module for unsupervised training of the encoder in the backbone network. Once trained, these two modules can be discarded during supervised training of the classifier or decoder for down-stream tasks. We conduct extensive experiments on 3D object classification, 3D part segmentation and shape reconstruction in various unsupervised and transfer settings. Both quantitative and qualitative results show that our method learns powerful representations and achieves new state-of-the-art performance.

الرؤية الحاسوبية وتمييز الأنماط

Cycle-Consistent Inverse GAN for Text-to-Image Synthesis

81 - Hao Wang , Guosheng Lin , Steven C. H. Hoi 2021

This paper investigates an open research task of text-to-image synthesis for automatically generating or manipulating images from text descriptions. Prevailing methods mainly use the text as conditions for GAN generation, and train different models f or the text-guided image generation and manipulation tasks. In this paper, we propose a novel unified framework of Cycle-consistent Inverse GAN (CI-GAN) for both text-to-image generation and text-guided image manipulation tasks. Specifically, we first train a GAN model without text input, aiming to generate images with high diversity and quality. Then we learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image, where we introduce the cycle-consistency training to learn more robust and consistent inverted latent codes. We further uncover the latent space semantics of the trained GAN model, by learning a similarity model between text representations and the latent codes. In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes. Extensive experiments on the Recipe1M and CUB datasets validate the efficacy of our proposed framework.

الرؤية الحاسوبية وتمييز الأنماط

Remember What You have drawn: Semantic Image Manipulation with Memory

391 - Xiangxi Shi , Zhonghua Wu , Guosheng Lin 2021

Image manipulation with natural language, which aims to manipulate images with the guidance of language descriptions, has been a challenging problem in the fields of computer vision and natural language processing (NLP). Currently, a number of effort s have been made for this task, but their performances are still distant away from generating realistic and text-conformed manipulated images. Therefore, in this paper, we propose a memory-based Image Manipulation Network (MIM-Net), where a set of memories learned from images is introduced to synthesize the texture information with the guidance of the textual description. We propose a two-stage network with an additional reconstruction stage to learn the latent memories efficiently. To avoid the unnecessary background changes, we propose a Target Localization Unit (TLU) to focus on the manipulation of the region mentioned by the text. Moreover, to learn a robust memory, we further propose a novel randomized memory training loss. Experiments on the four popular datasets show the better performance of our method compared to the existing ones.

الرؤية الحاسوبية وتمييز الأنماط

Dense Supervision Propagation for Weakly Supervised Semantic Segmentation on 3D Point Clouds

134 - Jiacheng Wei , Guosheng Lin , Kim-Hui Yap 2021

Semantic segmentation on 3D point clouds is an important task for 3D scene understanding. While dense labeling on 3D data is expensive and time-consuming, only a few works address weakly supervised semantic point cloud segmentation methods to relieve the labeling cost by learning from simpler and cheaper labels. Meanwhile, there are still huge performance gaps between existing weakly supervised methods and state-of-the-art fully supervised methods. In this paper, we train a semantic point cloud segmentation network with only a small portion of points being labeled. We argue that we can better utilize the limited supervision information as we densely propagate the supervision signal from the labeled points to other points within and across the input samples. Specifically, we propose a cross-sample feature reallocating module to transfer similar features and therefore re-route the gradients across two samples with common classes and an intra-sample feature redistribution module to propagate supervision signals on unlabeled points across and within point cloud samples. We conduct extensive experiments on public datasets S3DIS and ScanNet. Our weakly supervised method with only 10% and 1% of labels can produce compatible results with the fully supervised counterpart.

الرؤية الحاسوبية وتمييز الأنماط

Self-Point-Flow: Self-Supervised Scene Flow Estimation from Point Clouds with Optimal Transport and Random Walk

117 - Ruibo Li , Guosheng Lin , Lihua Xie 2021

Due to the scarcity of annotated scene flow data, self-supervised scene flow learning in point clouds has attracted increasing attention. In the self-supervised manner, establishing correspondences between two point clouds to approximate scene flow i s an effective approach. Previous methods often obtain correspondences by applying point-wise matching that only takes the distance on 3D point coordinates into account, introducing two critical issues: (1) it overlooks other discriminative measures, such as color and surface normal, which often bring fruitful clues for accurate matching; and (2) it often generates sub-par performance, as the matching is operated in an unconstrained situation, where multiple points can be ended up with the same corresponding point. To address the issues, we formulate this matching task as an optimal transport problem. The output optimal assignment matrix can be utilized to guide the generation of pseudo ground truth. In this optimal transport, we design the transport cost by considering multiple descriptors and encourage one-to-one matching by mass equality constraints. Also, constructing a graph on the points, a random walk module is introduced to encourage the local consistency of the pseudo labels. Comprehensive experiments on FlyingThings3D and KITTI show that our method achieves state-of-the-art performance among self-supervised learning methods. Our self-supervised method even performs on par with some supervised learning approaches, although we do not need any ground truth flow for training.

الرؤية الحاسوبية وتمييز الأنماط

HCRF-Flow: Scene Flow from Point Clouds with Continuous High-order CRFs and Position-aware Flow Embedding

219 - Ruibo Li , Guosheng Lin , Tong He 2021

Scene flow in 3D point clouds plays an important role in understanding dynamic environments. Although significant advances have been made by deep neural networks, the performance is far from satisfactory as only per-point translational motion is cons idered, neglecting the constraints of the rigid motion in local regions. To address the issue, we propose to introduce the motion consistency to force the smoothness among neighboring points. In addition, constraints on the rigidity of the local transformation are also added by sharing unique rigid motion parameters for all points within each local region. To this end, a high-order CRFs based relation module (Con-HCRFs) is deployed to explore both point-wise smoothness and region-wise rigidity. To empower the CRFs to have a discriminative unary term, we also introduce a position-aware flow estimation module to be incorporated into the Con-HCRFs. Comprehensive experiments on FlyingThings3D and KITTI show that our proposed framework (HCRF-Flow) achieves state-of-the-art performance and significantly outperforms previous approaches substantially.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد