أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Yue Cao

CaT: Weakly Supervised Object Detection with Category Transfer

142 - Tianyue Cao , Lianyu Du , Xiaoyun Zhang 2021

A large gap exists between fully-supervised object detection and weakly-supervised object detection. To narrow this gap, some methods consider knowledge transfer from additional fully-supervised dataset. But these methods do not fully exploit discrim inative category information in the fully-supervised dataset, thus causing low mAP. To solve this issue, we propose a novel category transfer framework for weakly supervised object detection. The intuition is to fully leverage both visually-discriminative and semantically-correlated category information in the fully-supervised dataset to enhance the object-classification ability of a weakly-supervised detector. To handle overlapping category transfer, we propose a double-supervision mean teacher to gather common category information and bridge the domain gap between two datasets. To handle non-overlapping category transfer, we propose a semantic graph convolutional network to promote the aggregation of semantic features between correlated categories. Experiments are conducted with Pascal VOC 2007 as the target weakly-supervised dataset and COCO as the source fully-supervised dataset. Our category transfer framework achieves 63.5% mAP and 80.3% CorLoc with 5 overlapping categories between two datasets, which outperforms the state-of-the-art methods. Codes are avaliable at https://github.com/MediaBrain-SJTU/CaT.

الرؤية الحاسوبية وتمييز الأنماط

Core mass function of a single giant molecular cloud complex with ~10^4 cores

92 - Yue Cao , Keping Qiu , Qizhou Zhang 2021

Similarity in shape between the initial mass function (IMF) and the core mass functions (CMFs) in star-forming regions prompts the idea that the IMF originates from the CMF through a self-similar core-to-star mass mapping process. To accurately deter mine the shape of the CMF, we create a sample of 8,431 cores with the dust continuum maps of the Cygnus X giant molecular cloud complex, and design a procedure for deriving the CMF considering the mass uncertainty, binning uncertainty, sample incompleteness, and the statistical errors. The resultant CMF coincides well with the IMF for core masses from a few $M_{odot}$ to the highest masses of 1300 $M_{odot}$ with a power-law of ${rm d}N/{rm d}Mpropto M^{-2.30pm0.04}$, but does not present an obvious flattened turnover in the low-mass range as the IMF does. More detailed inspection reveals that the slope of the CMF steepens with increasing mass. Given the numerous high-mass star-forming activities of Cygnus X, this is in stark contrast with the existing top-heavy CMFs found in high-mass star-forming clumps. We also find that the similarity between the IMF and the mass function of cloud structures is not unique at core scales, but can be seen for cloud structures of up to several pc scales. Finally, our SMA observations toward a subset of the cores do not present evidence for the self-similar mapping. The latter two results indicate that the shape of the IMF may not be directly inherited from the CMF.

الفيزياء الفلكية من المجرات الفيزياء الفلكية الشمسية والنجوم

Network of Star Formation: Fragmentation controlled by scale-dependent turbulent pressure and accretion onto the massive cores revealed in the Cygnus-X GMC complex

85 - Guang-Xing Li , Yue Cao , Keping Qiu 2021

Molecular clouds have complex density structures produced by processes including turbulence and gravity. We propose a triangulation-based method to dissect the density structure of a molecular cloud and study the interactions between dense cores and their environments. In our {approach}, a Delaunay triangulation is constructed, which consists of edges connecting these cores. Starting from this construction, we study the physical connections between neighboring dense cores and the ambient environment in a systematic fashion. We apply our method to the Cygnus-X massive GMC complex and find that the core separation is related to the mean surface density by $Sigma_{rm edge} propto l_{rm core }^{-0.28 }$, which can be explained by {fragmentation controlled by a scale-dependent turbulent pressure (where the pressure is a function of scale, e.g. $psim l^{2/3}$)}. We also find that the masses of low-mass cores ($M_{rm core} < 10, M_{odot}$) are determined by fragmentation, whereas massive cores ($M_{rm core} > 10, M_{odot}$) grow mostly through accretion. The transition from fragmentation to accretion coincides with the transition from a log-normal core mass function (CMF) to a power-law CMF. By constructing surface density profiles measured along edges that connect neighboring cores, we find evidence that the massive cores have accreted a significant fraction of gas from their surroundings and thus depleted the gas reservoir. Our analysis reveals a picture where cores form through fragmentation controlled by scale-dependent turbulent pressure support, followed by accretion onto the massive cores, {and the method can be applied to different regions to achieve deeper understandings in the future.

الفيزياء الفلكية من المجرات

Video Swin Transformer

115 - Ze Liu , Jia Ning , Yue Cao 2021

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks. These video models are all built on Transformer layers that glob ally connect patches across the spatial and temporal dimensions. In this paper, we instead advocate an inductive bias of locality in video Transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization. The locality of the proposed video architecture is realized by adapting the Swin Transformer designed for the image domain, while continuing to leverage the power of pre-trained image models. Our approach achieves state-of-the-art accuracy on a broad range of video recognition benchmarks, including on action recognition (84.9 top-1 accuracy on Kinetics-400 and 86.1 top-1 accuracy on Kinetics-600 with ~20x less pre-training data and ~3x smaller model size) and temporal modeling (69.6 top-1 accuracy on Something-Something v2). The code and models will be made publicly available at https://github.com/SwinTransformer/Video-Swin-Transformer.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

305 - Yue Cao , Payel Das , Vijil Chenthamarakshan 2021

Designing novel protein sequences for a desired 3D topological fold is a fundamental yet non-trivial task in protein engineering. Challenges exist due to the complex sequence--fold relationship, as well as the difficulties to capture the diversity of the sequences (therefore structures and functions) within a fold. To overcome these challenges, we propose Fold2Seq, a novel transformer-based generative framework for designing protein sequences conditioned on a specific target fold. To model the complex sequence--structure relationship, Fold2Seq jointly learns a sequence embedding using a transformer and a fold embedding from the density of secondary structural elements in 3D voxels. On test sets with single, high-resolution and complete structure inputs for individual folds, our experiments demonstrate improved or comparable performance of Fold2Seq in terms of speed, coverage, and reliability for sequence design, when compared to existing state-of-the-art methods that include data-driven deep generative models and physics-based RosettaDesign. The unique advantages of fold-based Fold2Seq, in comparison to a structure-based deep model and RosettaDesign, become more evident on three additional real-world challenges originating from low-quality, incomplete, or ambiguous input structures. Source code and data are available at https://github.com/IBM/fold2seq.

التعلم الآلي الجزيئات الحيوية

Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser

104 - Yue Cao , Xiaohe Wu , Shuran Qi 2021

The success of deep denoisers on real-world color photographs usually relies on the modeling of sensor noise and in-camera signal processing (ISP) pipeline. Performance drop will inevitably happen when the sensor and ISP pipeline of test images are d ifferent from those for training the deep denoisers (i.e., noise discrepancy). In this paper, we present an unpaired learning scheme to adapt a color image denoiser for handling test images with noise discrepancy. We consider a practical training setting, i.e., a pre-trained denoiser, a set of test noisy images, and an unpaired set of clean images. To begin with, the pre-trained denoiser is used to generate the pseudo clean images for the test images. Pseudo-ISP is then suggested to jointly learn the pseudo ISP pipeline and signal-dependent rawRGB noise model using the pairs of test and pseudo clean images. We further apply the learned pseudo ISP and rawRGB noise model to clean color images to synthesize realistic noisy images for denoiser adaption. Pseudo-ISP is effective in synthesizing realistic noisy sRGB images, and improved denoising performance can be achieved by alternating between Pseudo-ISP training and denoiser adaption. Experiments show that our Pseudo-ISP not only can boost simple Gaussian blurring-based denoiser to achieve competitive performance against CBDNet, but also is effective in improving state-of-the-art deep denoisers, e.g., CBDNet and RIDNet.

الرؤية الحاسوبية وتمييز الأنماط

Two characterizations of the grid graphs

68 - Brhane Gebremichel , Meng-Yue Cao , Jack H. Koolen 2021

In this paper we give two characterizations of the $p times q$-grid graphs as co-edge-regular graphs with four distinct eigenvalues.

التوافقية

Maximality of Seidel matrices and switching roots of graphs

96 - Meng-Yue Cao , Jack H. Koolen , Akihiro Munemasa 2021

In this paper, we discuss maximality of Seidel matrices with a fixed largest eigenvalue. We present a classification of maximal Seidel matrices of largest eigenvalue $3$, which gives a classification of maximal equiangular lines in a Euclidean space with angle $arccos1/3$. Motivated by the maximality of the exceptional root system $E_8$, we define strong maximality of a Seidel matrix, and show that every Seidel matrix achieving the absolute bound is strongly maximal.

التوافقية

ParaSCI: A Large Scientific Paraphrase Dataset for Longer Paraphrase Generation

222 - Qingxiu Dong , Xiaojun Wan , Yue Cao 2021

We propose ParaSCI, the first large-scale paraphrase dataset in the scientific field, including 33,981 paraphrase pairs from ACL (ParaSCI-ACL) and 316,063 pairs from arXiv (ParaSCI-arXiv). Digging into characteristics and common patterns of scientifi c papers, we construct this dataset though intra-paper and inter-paper methods, such as collecting citations to the same paper or aggregating definitions by scientific terms. To take advantage of sentences paraphrased partially, we put up PDBERT as a general paraphrase discovering method. The major advantages of paraphrases in ParaSCI lie in the prominent length and textual diversity, which is complementary to existing paraphrase datasets. ParaSCI obtains satisfactory results on human evaluation and downstream tasks, especially long paraphrase generation.

الحساب واللغة

Global Context Networks

56 - Yue Cao , Jiarui Xu , Stephen Lin 2020

The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies within an image, via aggregating query-specific global context to each query position. However, through a rigorous empirical analysis, we have found th at the global contexts modeled by the non-local network are almost the same for different query positions. In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation. We further replace the one-layer transformation function of the non-local block by a two-layer bottleneck, which further reduces the parameter number considerably. The resulting network element, called the global context (GC) block, effectively models global context in a lightweight manner, allowing it to be applied at multiple layers of a backbone network to form a global context network (GCNet). Experiments show that GCNet generally outperforms NLNet on major benchmarks for various recognition tasks. The code and network configurations are available at https://github.com/xvjiarui/GCNet.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد