أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Ning Xu

TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

145 - Jinyu Yang , Jingjing Liu , Ning Xu 2021

Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain. Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations. With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge, however, remains unexplored in the literature. To fill this gap, this paper first comprehensively investigates the transferability of ViT on a variety of domain adaptation tasks. Surprisingly, ViT demonstrates superior transferability over its CNNs-based counterparts with a large margin, while the performance can be further improved by incorporating adversarial adaptation. Notwithstanding, directly using CNNs-based adaptation strategies fails to take the advantage of ViTs intrinsic merits (e.g., attention mechanism and sequential image representation) which play an important role in knowledge transfer. To remedy this, we propose an unified framework, namely Transferable Vision Transformer (TVT), to fully exploit the transferability of ViT for domain adaptation. Specifically, we delicately devise a novel and effective unit, which we term Transferability Adaption Module (TAM). By injecting learned transferabilities into attention blocks, TAM compels ViT focus on both transferable and discriminative features. Besides, we leverage discriminative clustering to enhance feature diversity and separation which are undermined during adversarial domain alignment. To verify its versatility, we perform extensive studies of TVT on four benchmarks and the experimental results demonstrate that TVT attains significant improvements compared to existing state-of-the-art UDA methods.

الرؤية الحاسوبية وتمييز الأنماط

Active-set algorithms based statistical inference for shape-restricted generalized additive Cox regression models

111 - Geng Deng , Guangning Xu , Qiang Fu 2021

Recently the shape-restricted inference has gained popularity in statistical and econometric literature in order to relax the linear or quadratic covariate effect in regression analyses. The typical shape-restricted covariate effect includes monotoni c increasing, decreasing, convexity or concavity. In this paper, we introduce the shape-restricted inference to the celebrated Cox regression model (SR-Cox), in which the covariate response is modeled as shape-restricted additive functions. The SR-Cox regression approximates the shape-restricted functions using a spline basis expansion with data driven choice of knots. The underlying minimization of negative log-likelihood function is formulated as a convex optimization problem, which is solved with an active-set optimization algorithm. The highlight of this algorithm is that it eliminates the superfluous knots automatically. When covariate effects include combinations of convex or concave terms with unknown forms and linear terms, the most interesting finding is that SR-Cox produces accurate linear covariate effect estimates which are comparable to the maximum partial likelihood estimates if indeed the forms are known. We conclude that concave or convex SR-Cox models could significantly improve nonlinear covariate response recovery and model goodness of fit.

المنهجية التعلم الالي

Learning by Planning: Language-Guided Global Image Editing

171 - Jing Shi , Ning Xu , Yihang Xu 2021

Recently, language-guided global image editing draws increasing attention with growing application potentials. However, previous GAN-based methods are not only confined to domain-specific, low-resolution data but also lacking in interpretability. To overcome the collective difficulties, we develop a text-to-operation model to map the vague editing language request into a series of editing operations, e.g., change contrast, brightness, and saturation. Each operation is interpretable and differentiable. Furthermore, the only supervision in the task is the target image, which is insufficient for a stable training of sequential decisions. Hence, we propose a novel operation planning algorithm to generate possible editing sequences from the target image as pseudo ground truth. Comparison experiments on the newly collected MA5k-Req dataset and GIER dataset show the advantages of our methods. Code is available at https://jshi31.github.io/T2ONet.

الرؤية الحاسوبية وتمييز الأنماط

54 - Ruoyang Mo , Qinyi Liao , Ning Xu 2021

Different from previous modelings of self-propelled particles, we develop a method to propel the particles with a constant average velocity instead of a constant force. This constant propulsion velocity (CPV) approach is validated by its agreement wi th the conventional constant propulsion force (CPF) approach in the flowing regime. However, the CPV approach shows its advantage of accessing quasistatic flows of yield stress fluids with a vanishing propulsion velocity, while the CPF approach is usually unable to because of finite system size. Taking this advantage, we realize the cyclic self-propulsion and study the evolution of the propulsion force with propelled particle displacement, both in the quasistatic flow regime. By mapping shear stress and shear rate to propulsion force and propulsion velocity, we find similar rheological behaviors of self-propelled systems to sheared systems, including the yield force gap between the CPF and CPV approaches, propulsion force overshoot, reversible-irreversible transition under cyclic propulsion, and propulsion bands in plastic flows. These similarities suggest the underlying connections between self-propulsion and shear, although they act on systems in different ways.

مادة مكثفة ناعمة

A Hybrid Pricing and Cutting Approach for the Multi-Shift Full Truckload Vehicle Routing Problem

90 - Ning Xue , Ruibin Bai , Rong Qu 2020

Full truckload transportation (FTL) in the form of freight containers represents one of the most important transportation modes in international trade. Due to large volume and scale, in FTL, delivery time is often less critical but cost and service q uality are crucial. Therefore, efficiently solving large scale multiple shift FTL problems is becoming more and more important and requires further research. In one of our earlier studies, a set covering model and a three-stage solution method were developed for a multi-shift FTL problem. This paper extends the previous work and presents a significantly more efficient approach by hybridising pricing and cutting strategies with metaheuristics (a variable neighbourhood search and a genetic algorithm). The metaheuristics were adopted to find promising columns (vehicle routes) guided by pricing and cuts are dynamically generated to eliminate infeasible flow assignments caused by incompatible commodities. Computational experiments on real-life and artificial benchmark FTL problems showed superior performance both in terms of computational time and solution quality, when compared with previous MIP based three-stage methods and two existing metaheuristics. The proposed cutting and heuristic pricing approach can efficiently solve large scale real-life FTL problems.

الذكاء الاصطناعي التحسين والتحكم

Moir{e} effects in graphene--hBN heterostructures

141 - Yongping Du , Ning Xu , Xianqing Lin 2020

Encapsulating graphene in hexagonal Boron Nitride has several advantages: the highest mobilities reported to date are achieved in this way, and precise nanostructuring of graphene becomes feasible through the protective hBN layers. Nevertheless, subt le effects may arise due to the differing lattice constants of graphene and hBN, and due to the twist angle between the graphene and hBN lattices. Here, we use a recently developed model which allows us to perform band structure and magnetotransport calculations of such structures, and show that with a proper account of the moire physics an excellent agreement with experiments can be achieved, even for complicated structures such as disordered graphene, or antidot lattices on a monolayer hBN with a relative twist angle. Calculations of this kind are essential to a quantitative modeling of twistronic devices.

الفيزياء ميسكالي وننكالي علم المواد

Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation

88 - Yuxi Li , Ning Xu , Jinlong Peng 2020

In this paper, we address several inadequacies of current video object segmentation pipelines. Firstly, a cyclic mechanism is incorporated to the standard semi-supervised process to produce more robust representations. By relying on the accurate refe rence mask in the starting frame, we show that the error propagation problem can be mitigated. Next, we introduce a simple gradient correction module, which extends the offline pipeline to an online method while maintaining the efficiency of the former. Finally we develop cycle effective receptive field (cycle-ERF) based on gradient correction to provide a new perspective into analyzing object-specific regions of interests. We conduct comprehensive experiments on challenging benchmarks of DAVIS17 and Youtube-VOS, demonstrating that the cyclic mechanism is beneficial to segmentation quality.

الرؤية الحاسوبية وتمييز الأنماط

A Benchmark and Baseline for Language-Driven Image Editing

80 - Jing Shi , Ning Xu , Trung Bui 2020

Language-driven image editing can significantly save the laborious image editing work and be friendly to the photography novice. However, most similar work can only deal with a specific image domain or can only do global retouching. To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations. Besides, we also propose a baseline method that fully utilizes the annotation to solve this problem. Our new method treats each editing operation as a sub-module and can automatically predict operation parameters. Not only performing well on challenging user data, but such an approach is also highly interpretable. We believe our work, including both the benchmark and the baseline, will advance the image editing area towards a more general and free-form level.

الرؤية الحاسوبية وتمييز الأنماط

A New Dataset for Amateur Vocal Percussion Analysis

89 - Alejandro Delgado , SKoT McDonald , Ning Xu 2020

The imitation of percussive instruments via the human voice is a natural way for us to communicate rhythmic ideas and, for this reason, it attracts the interest of music makers. Specifically, the automatic mapping of these vocal imitations to their e mulated instruments would allow creators to realistically prototype rhythms in a faster way. The contribution of this study is two-fold. Firstly, a new Amateur Vocal Percussion (AVP) dataset is introduced to investigate how people with little or no experience in beatboxing approach the task of vocal percussion. The end-goal of this analysis is that of helping mapping algorithms to better generalise between subjects and achieve higher performances. The dataset comprises a total of 9780 utterances recorded by 28 participants with fully annotated onsets and labels (kick drum, snare drum, closed hi-hat and opened hi-hat). Lastly, we conducted baseline experiments on audio onset detection with the recorded dataset, comparing the performance of four state-of-the-art algorithms in a vocal percussion context.

معالجة الصوت والكلام

High-Resolution Deep Image Matting

77 - Haichao Yu , Ning Xu , Zilong Huang 2020

Image matting is a key technique for image and video editing and composition. Conventionally, deep learning approaches take the whole input image and an associated trimap to infer the alpha matte using convolutional neural networks. Such approaches s et state-of-the-arts in image matting; however, they may fail in real-world matting applications due to hardware limitations, since real-world input images for matting are mostly of very high resolution. In this paper, we propose HDMatt, a first deep learning based image matting approach for high-resolution inputs. More concretely, HDMatt runs matting in a patch-based crop-and-stitch manner for high-resolution inputs with a novel module design to address the contextual dependency and consistency issues between different patches. Compared with vanilla patch-based inference which computes each patch independently, we explicitly model the cross-patch contextual dependency with a newly-proposed Cross-Patch Contextual module (CPC) guided by the given trimap. Extensive experiments demonstrate the effectiveness of the proposed method and its necessity for high-resolution inputs. Our HDMatt approach also sets new state-of-the-art performance on Adobe Image Matting and AlphaMatting benchmarks and produce impressive visual results on more real-world high-resolution images.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد