أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Chao Wang

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

257 - Tongkun Xu , Weihua Chen , Pichao Wang 2021

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled source domain to a different unlabeled target domain. Most existing UDA methods focus on learning domain-invariant feature representation, either from the domain l evel or category level, using convolution neural networks (CNNs)-based frameworks. One fundamental problem for the category level based UDA is the production of pseudo labels for samples in target domain, which are usually too noisy for accurate domain alignment, inevitably compromising the UDA performance. With the success of Transformer in various tasks, we find that the cross-attention in Transformer is robust to the noisy input pairs for better feature alignment, thus in this paper Transformer is adopted for the challenging UDA task. Specifically, to generate accurate input pairs, we design a two-way center-aware labeling algorithm to produce pseudo labels for target samples. Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively. Such design explicitly enforces the framework to learn discriminative domain-specific and domain-invariant representations simultaneously. The proposed method is dubbed CDTrans (cross-domain transformer), and it provides one of the first attempts to solve UDA tasks with a pure transformer solution. Extensive experiments show that our proposed method achieves the best performance on Office-Home, VisDA-2017, and DomainNet datasets.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Math Word Problem Generation with Mathematical Consistency and Problem Context Constraints

110 - Zichao Wang , Andrew S. Lan , Richard G. Baraniuk 2021

We study the problem of generating arithmetic math word problems (MWPs) given a math equation that specifies the mathematical computation and a context that specifies the problem scenario. Existing approaches are prone to generating MWPs that are eit her mathematically invalid or have unsatisfactory language quality. They also either ignore the context or require manual specification of a problem template, which compromises the diversity of the generated MWPs. In this paper, we develop a novel MWP generation approach that leverages i) pre-trained language models and a context keyword selection model to improve the language quality of the generated MWPs and ii) an equation consistency constraint for math equations to improve the mathematical validity of the generated MWPs. Extensive quantitative and qualitative experiments on three real-world MWP datasets demonstrate the superior performance of our approach compared to various baselines.

الحساب واللغة

Scaled ReLU Matters for Training Vision Transformers

117 - Pichao Wang , Xue Wang , Hao Luo 2021

Vision transformers (ViTs) have been an alternative design paradigm to convolutional neural networks (CNNs). However, the training of ViTs is much harder than CNNs, as it is sensitive to the training parameters, such as learning rate, optimizer and w armup epoch. The reasons for training difficulty are empirically analysed in ~cite{xiao2021early}, and the authors conjecture that the issue lies with the textit{patchify-stem} of ViT models and propose that early convolutions help transformers see better. In this paper, we further investigate this problem and extend the above conclusion: only early convolutions do not help for stable training, but the scaled ReLU operation in the textit{convolutional stem} (textit{conv-stem}) matters. We verify, both theoretically and empirically, that scaled ReLU in textit{conv-stem} not only improves training stabilization, but also increases the diversity of patch tokens, thus boosting peak performance with a large margin via adding few parameters and flops. In addition, extensive experiments are conducted to demonstrate that previous ViTs are far from being well trained, further showing that ViTs have great potential to be a better substitute of CNNs.

الرؤية الحاسوبية وتمييز الأنماط

Indexing Context-Sensitive Reachability

463 - Qingkai Shi , Yongchao Wang , Charles Zhang 2021

Many context-sensitive data flow analyses can be formulated as a variant of the all-pairs Dyck-CFL reachability problem, which, in general, is of sub-cubic time complexity and quadratic space complexity. Such high complexity significantly limits the scalability of context-sensitive data flow analysis and is not affordable for analyzing large-scale software. This paper presents textsc{Flare}, a reduction from the CFL reachability problem to the conventional graph reachability problem for context-sensitive data flow analysis. This reduction allows us to benefit from recent advances in reachability indexing schemes, which often consume almost linear space for answering reachability queries in almost constant time. We have applied our reduction to a context-sensitive alias analysis and a context-sensitive information-flow analysis for C/C++ programs. Experimental results on standard benchmarks and open-source software demonstrate that we can achieve orders of magnitude speedup at the cost of only moderate space to store the indexes. The implementation of our approach is publicly available.

الحساب واللغة لغات البرمجة

Projected entangled pair states study of anisotropic-exchange magnets on triangular lattice

84 - Meng Zhang , Chao Wang , Shaojun Dong 2021

The anisotropic-exchange spin-1/2 model on a triangular lattice has been used to describe the rare-earth chalcogenides, which may have exotic ground states. We investigate the quantum phase diagram of the model by using the projected entangled pair s tate (PEPS) method, and compare it to the classical phase diagram. Besides two stripe-ordered phase, and the 120$^circ$ state, there is also a multi-textbf{Q} phase. We identify the multi-textbf{Q} phase as a $Z_{2}$ vortex state. No quantum spin liquid state is found in the phase diagram, contrary to the previous DMRG calculations.

الإلكترونات المرتبطة بشدة علم المواد

Structure-Aware Feature Generation for Zero-Shot Learning

213 - Lianbo Zhang , Shaoli Huang , Xinchao Wang 2021

Zero-Shot Learning (ZSL) targets at recognizing unseen categories by leveraging auxiliary information, such as attribute embedding. Despite the encouraging results achieved, prior ZSL approaches focus on improving the discriminant power of seen-class features, yet have largely overlooked the geometric structure of the samples and the prototypes. The subsequent attribute-based generative adversarial network (GAN), as a result, also neglects the topological information in sample generation and further yields inferior performances in classifying the visual features of unseen classes. In this paper, we introduce a novel structure-aware feature generation scheme, termed as SA-GAN, to explicitly account for the topological structure in learning both the latent space and the generative networks. Specifically, we introduce a constraint loss to preserve the initial geometric structure when learning a discriminative latent space, and carry out our GAN training with additional supervising signals from a structure-aware discriminator and a reconstruction module. The former supervision distinguishes fake and real samples based on their affinity to class prototypes, while the latter aims to reconstruct the original feature space from the generated latent space. This topology-preserving mechanism enables our method to significantly enhance the generalization capability on unseen-classes and consequently improve the classification performance. Experiments on four benchmarks demonstrate that the proposed approach consistently outperforms the state of the art. Our code can be found in the supplementary material and will also be made publicly available.

الرؤية الحاسوبية وتمييز الأنماط

Hierarchical Structural Analysis Method for Complex Equation-oriented Models

269 - Chao Wang , Li Wan , Tifan Xiong 2021

Structural analysis is a method for verifying equation-oriented models in the design of industrial systems. Existing structural analysis methods need flattening the hierarchical models into an equation system for analysis. However, the large-scale eq uations in complex models make the structural analysis difficult. Aimed to address the issue, this study proposes a hierarchical structural analysis method by exploring the relationship between the singularities of the hierarchical equation-oriented model and its components. This method obtains the singularity of a hierarchical equation-oriented model by analyzing the dummy model constructed with the parts from the decomposing results of its components. Based on this, the structural singularity of a complex model can be obtained by layer-by-layer analysis according to their natural hierarchy. The hierarchical structural analysis method can reduce the equation scale in each analysis and achieve efficient structural analysis of very complex models. This method can be adaptively applied to nonlinear algebraic and differential-algebraic equation models. The main algorithms, application cases, and comparison with the existing methods are present in the paper. Complexity analysis results show the enhanced efficiency of the proposed method in structural analysis of complex equation-oriented models. As compared with the existing methods, the time complexity of the proposed method is improved significantly.

علوم الكمبيوتر

BORM: Bayesian Object Relation Model for Indoor Scene Recognition

92 - Liguang Zhou , Jun Cen , Xingchao Wang 2021

Scene recognition is a fundamental task in robotic perception. For human beings, scene recognition is reasonable because they have abundant object knowledge of the real world. The idea of transferring prior object knowledge from humans to scene recog nition is significant but still less exploited. In this paper, we propose to utilize meaningful object representations for indoor scene representation. First, we utilize an improved object model (IOM) as a baseline that enriches the object knowledge by introducing a scene parsing algorithm pretrained on the ADE20K dataset with rich object categories related to the indoor scene. To analyze the object co-occurrences and pairwise object relations, we formulate the IOM from a Bayesian perspective as the Bayesian object relation model (BORM). Meanwhile, we incorporate the proposed BORM with the PlacesCNN model as the combined Bayesian object relation model (CBORM) for scene recognition and significantly outperforms the state-of-the-art methods on the reduced Places365 dataset, and SUN RGB-D dataset without retraining, showing the excellent generalization ability of the proposed method. Code can be found at https://github.com/hszhoushen/borm.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

Boundary Knowledge Translation based Reference Semantic Segmentation

143 - Lechao Cheng , Zunlei Feng , Xinchao Wang 2021

Given a reference object of an unknown type in an image, human observers can effortlessly find the objects of the same category in another image and precisely tell their visual boundaries. Such visual cognition capability of humans seems absent from the current research spectrum of computer vision. Existing segmentation networks, for example, rely on a humongous amount of labeled data, which is laborious and costly to collect and annotate; besides, the performance of segmentation networks tend to downgrade as the number of the category increases. In this paper, we introduce a novel Reference semantic segmentation Network (Ref-Net) to conduct visual boundary knowledge translation. Ref-Net contains a Reference Segmentation Module (RSM) and a Boundary Knowledge Translation Module (BKTM). Inspired by the human recognition mechanism, RSM is devised only to segment the same category objects based on the features of the reference objects. BKTM, on the other hand, introduces two boundary discriminator branches to conduct inner and outer boundary segmentation of the target objectin an adversarial manner, and translate the annotated boundary knowledge of open-source datasets into the segmentation network. Exhaustive experiments demonstrate that, with tens of finely-grained annotated samples as guidance, Ref-Net achieves results on par with fully supervised methods on six datasets.

الرؤية الحاسوبية وتمييز الأنماط

Edge-competing Pathological Liver Vessel Segmentation with Limited Labels

103 - Zunlei Feng , Zhonghua Wang , Xinchao Wang 2021

The microvascular invasion (MVI) is a major prognostic factor in hepatocellular carcinoma, which is one of the malignant tumors with the highest mortality rate. The diagnosis of MVI needs discovering the vessels that contain hepatocellular carcinoma cells and counting their number in each vessel, which depends heavily on experiences of the doctor, is largely subjective and time-consuming. However, there is no algorithm as yet tailored for the MVI detection from pathological images. This paper collects the first pathological liver image dataset containing 522 whole slide images with labels of vessels, MVI, and hepatocellular carcinoma grades. The first and essential step for the automatic diagnosis of MVI is the accurate segmentation of vessels. The unique characteristics of pathological liver images, such as super-large size, multi-scale vessel, and blurred vessel edges, make the accurate vessel segmentation challenging. Based on the collected dataset, we propose an Edge-competing Vessel Segmentation Network (EVS-Net), which contains a segmentation network and two edge segmentation discriminators. The segmentation network, combined with an edge-aware self-supervision mechanism, is devised to conduct vessel segmentation with limited labeled patches. Meanwhile, two discriminators are introduced to distinguish whether the segmented vessel and background contain residual features in an adversarial manner. In the training stage, two discriminators are devised tocompete for the predicted position of edges. Exhaustive experiments demonstrate that, with only limited labeled patches, EVS-Net achieves a close performance of fully supervised methods, which provides a convenient tool for the pathological liver vessel segmentation. Code is publicly available at https://github.com/zju-vipa/EVS-Net.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد