أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Kun Xu

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

257 - Tongkun Xu , Weihua Chen , Pichao Wang 2021

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled source domain to a different unlabeled target domain. Most existing UDA methods focus on learning domain-invariant feature representation, either from the domain l evel or category level, using convolution neural networks (CNNs)-based frameworks. One fundamental problem for the category level based UDA is the production of pseudo labels for samples in target domain, which are usually too noisy for accurate domain alignment, inevitably compromising the UDA performance. With the success of Transformer in various tasks, we find that the cross-attention in Transformer is robust to the noisy input pairs for better feature alignment, thus in this paper Transformer is adopted for the challenging UDA task. Specifically, to generate accurate input pairs, we design a two-way center-aware labeling algorithm to produce pseudo labels for target samples. Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively. Such design explicitly enforces the framework to learn discriminative domain-specific and domain-invariant representations simultaneously. The proposed method is dubbed CDTrans (cross-domain transformer), and it provides one of the first attempts to solve UDA tasks with a pure transformer solution. Extensive experiments show that our proposed method achieves the best performance on Office-Home, VisDA-2017, and DomainNet datasets.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging

319 - Ziyi Meng , Zhenming Yu , Kun Xu 2021

We consider using {bfem untrained neural networks} to solve the reconstruction problem of snapshot compressive imaging (SCI), which uses a two-dimensional (2D) detector to capture a high-dimensional (usually 3D) data-cube in a compressed manner. Vari ous SCI systems have been built in recent years to capture data such as high-speed videos, hyperspectral images, and the state-of-the-art reconstruction is obtained by the deep neural networks. However, most of these networks are trained in an end-to-end manner by a large amount of corpus with sometimes simulated ground truth, measurement pairs. In this paper, inspired by the untrained neural networks such as deep image priors (DIP) and deep decoders, we develop a framework by integrating DIP into the plug-and-play regime, leading to a self-supervised network for spectral SCI reconstruction. Extensive synthetic and real data results show that the proposed algorithm without training is capable of achieving competitive results to the training based networks. Furthermore, by integrating the proposed method with a pre-trained deep denoising prior, we have achieved state-of-the-art results. {Our code is available at url{https://github.com/mengziyi64/CASSI-Self-Supervised}.}

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Principle-driven Fiber Transmission Model based on PINN Neural Network

263 - Yubin Zang , Zhenming Yu , Kun Xu 2021

In this paper, a novel principle-driven fiber transmission model based on physical induced neural network (PINN) is proposed. Unlike data-driven models which regard fiber transmission problem as data regression tasks, this model views it as an equati on solving problem. Instead of adopting input signals and output signals which are calculated by SSFM algorithm in advance before training, this principle-driven PINN based fiber model adopts frames of time and distance as its inputs and the corresponding real and imaginary parts of NLSE solutions as its outputs. By taking into account of pulses and signals before transmission as initial conditions and fiber physical principles as NLSE in the design of loss functions, this model will progressively learn the transmission rules. Therefore, it can be effectively trained without the data labels, referred as the pre-calculated signals after transmission in data-driven models. Due to this advantage, SSFM algorithm is no longer needed before the training of principle-driven fiber model which can save considerable time consumption. Through numerical demonstration, the results show that this principle-driven PINN based fiber model can handle the prediction tasks of pulse evolution, signal transmission and fiber birefringence for different transmission parameters of fiber telecommunications.

معالجة الإشارات

An End-to-End Deep Learning Approach for Epileptic Seizure Prediction

368 - Yankun Xu , Jie Yang , Shiqi Zhao 2021

An accurate seizure prediction system enables early warnings before seizure onset of epileptic patients. It is extremely important for drug-refractory patients. Conventional seizure prediction works usually rely on features extracted from Electroence phalography (EEG) recordings and classification algorithms such as regression or support vector machine (SVM) to locate the short time before seizure onset. However, such methods cannot achieve high-accuracy prediction due to information loss of the hand-crafted features and the limited classification ability of regression and SVM algorithms. We propose an end-to-end deep learning solution using a convolutional neural network (CNN) in this paper. One and two dimensional kernels are adopted in the early- and late-stage convolution and max-pooling layers, respectively. The proposed CNN model is evaluated on Kaggle intracranial and CHB-MIT scalp EEG datasets. Overall sensitivity, false prediction rate, and area under receiver operating characteristic curve reaches 93.5%, 0.063/h, 0.981 and 98.8%, 0.074/h, 0.988 on two datasets respectively. Comparison with state-of-the-art works indicates that the proposed model achieves exceeding prediction performance.

معالجة الإشارات التعلم الآلي

High-order gas-kinetic scheme in general curvilinear coordinate for iLES of compressible wall-bounded turbulent flows

509 - Guiyu Cao , Kun Xu , Liang Pan 2021

In this paper, a high-order gas-kinetic scheme in general curvilinear coordinate (HGKS-cur) is developed for the numerical simulation of compressible turbulence. Based on the coordinate transformation, the Bhatnagar-Gross-Krook (BGK) equation is tran sformed from physical space to computational space. To deal with the general mesh given by discretized points, the geometrical metrics need to be constructed by the dimension-by-dimension Lagrangian interpolation. The multidimensional weighted essentially non-oscillatory (WENO) reconstruction is adopted in the computational domain for spatial accuracy, where the reconstructed variables are the cell averaged Jacobian and the Jacobian-weighted conservative variables. The two-stage fourth-order method, which was developed for spatial-temporal coupled flow solvers, is used for temporal discretization. The numerical examples for inviscid and laminar flows validate the accuracy and geometrical conservation law of HGKS-cur. As a direct application, HGKS-cur is implemented for the implicit large eddy simulation (iLES) in compressible wall-bounded turbulent flows, including the compressible turbulent channel flow and compressible turbulent flow over periodic hills. The iLES results with HGKS-cur are in good agreement with the refereed spectral methods and high-order finite volume methods. The performance of HGKS-cur demonstrates its capability as a powerful tool for the numerical simulation of compressible wall-bounded turbulent flows and massively separated flows.

ديناميات السوائل

On Codomain Separability and Label Inference from (Noisy) Loss Functions

180 - Abhinav Aggarwal , Shiva Prasad Kasiviswanathan , Zekun Xu 2021

Machine learning classifiers rely on loss functions for performance evaluation, often on a private (hidden) dataset. Label inference was recently introduced as the problem of reconstructing the ground truth labels of this private dataset from just th e (possibly perturbed) loss function values evaluated at chosen prediction vectors, without any other access to the hidden dataset. Existing results have demonstrated this inference is possible on specific loss functions like the cross-entropy loss. In this paper, we introduce the notion of codomain separability to formally study the necessary and sufficient conditions under which label inference is possible from any (noisy) loss function values. Using this notion, we show that for many commonly used loss functions, including multiclass cross-entropy with common activation functions and some Bregman divergence-based losses, it is possible to design label inference attacks for arbitrary noise levels. We demonstrate that these attacks can also be carried out through actual neural network models, and argue, both formally and empirically, the role of finite precision arithmetic in this setting.

التعلم الآلي

Exploiting Spiking Dynamics with Spatial-temporal Feature Normalization in Graph Learning

44 - Mingkun Xu , Yujie Wu , Lei Deng 2021

Biological spiking neurons with intrinsic dynamics underlie the powerful representation and learning capabilities of the brain for processing multimodal information in complex environments. Despite recent tremendous progress in spiking neural network s (SNNs) for handling Euclidean-space tasks, it still remains challenging to exploit SNNs in processing non-Euclidean-space data represented by graph data, mainly due to the lack of effective modeling framework and useful training techniques. Here we present a general spike-based modeling framework that enables the direct training of SNNs for graph learning. Through spatial-temporal unfolding for spiking data flows of node features, we incorporate graph convolution filters into spiking dynamics and formalize a synergistic learning paradigm. Considering the unique features of spike representation and spiking dynamics, we propose a spatial-temporal feature normalization (STFN) technique suitable for SNN to accelerate convergence. We instantiate our methods into two spiking graph models, including graph convolution SNNs and graph attention SNNs, and validate their performance on three node-classification benchmarks, including Cora, Citeseer, and Pubmed. Our model can achieve comparable performance with the state-of-the-art graph neural network (GNN) models with much lower computation costs, demonstrating great benefits for the execution on neuromorphic hardware and prompting neuromorphic applications in graphical scenarios.

الحوسبة العصبية والتطورية التعلم الآلي

Consistent Instance False Positive Improves Fairness in Face Recognition

63 - Xingkun Xu , Yuge Huang , Pengcheng Shen 2021

Demographic bias is a significant challenge in practical face recognition systems. Existing methods heavily rely on accurate demographic annotations. However, such annotations are usually unavailable in real scenarios. Moreover, these methods are typ ically designed for a specific demographic group and are not general enough. In this paper, we propose a false positive rate penalty loss, which mitigates face recognition bias by increasing the consistency of instance False Positive Rate (FPR). Specifically, we first define the instance FPR as the ratio between the number of the non-target similarities above a unified threshold and the total number of the non-target similarities. The unified threshold is estimated for a given total FPR. Then, an additional penalty term, which is in proportion to the ratio of instance FPR overall FPR, is introduced into the denominator of the softmax-based loss. The larger the instance FPR, the larger the penalty. By such unequal penalties, the instance FPRs are supposed to be consistent. Compared with the previous debiasing methods, our method requires no demographic annotations. Thus, it can mitigate the bias among demographic groups divided by various attributes, and these attributes are not needed to be previously predefined during training. Extensive experimental results on popular benchmarks demonstrate the superiority of our method over state-of-the-art competitors. Code and trained models are available at https://github.com/Tencent/TFace.

الرؤية الحاسوبية وتمييز الأنماط

Domain-Adaptive Pretraining Methods for Dialogue Understanding

154 - Han Wu , Kun Xu , Linfeng Song 2021

Language models like BERT and SpanBERT pretrained on open-domain data have obtained impressive gains on various NLP tasks. In this paper, we probe the effectiveness of domain-adaptive pretraining objectives on downstream tasks. In particular, three o bjectives, including a novel objective focusing on modeling predicate-argument relations, are evaluated on two challenging dialogue understanding tasks. Experimental results demonstrate that domain-adaptive pretraining with proper objectives can significantly improve the performance of a strong baseline on these tasks, achieving the new state-of-the-art performances.

الحساب واللغة الذكاء الاصطناعي

Representation Learning in Sequence to Sequence Tasks: Multi-filter Gaussian Mixture Autoencoder

92 - Yunhao Yang , Zhaokun Xue 2021

Heterogeneity of sentences exists in sequence to sequence tasks such as machine translation. Sentences with largely varied meanings or grammatical structures may increase the difficulty of convergence while training the network. In this paper, we int roduce a model to resolve the heterogeneity in the sequence to sequence task. The Multi-filter Gaussian Mixture Autoencoder (MGMAE) utilizes an autoencoder to learn the representations of the inputs. The representations are the outputs from the encoder, lying in the latent space whose dimension is the hidden dimension of the encoder. The representations of training data in the latent space are used to train Gaussian mixtures. The latent space representations are divided into several mixtures of Gaussian distributions. A filter (decoder) is tuned to fit the data in one of the Gaussian distributions specifically. Each Gaussian is corresponding to one filter so that the filter is responsible for the heterogeneity within this Gaussian. Thus the heterogeneity of the training data can be resolved. Comparative experiments are conducted on the Geo-query dataset and English-French translation. Our experiments show that compares to the traditional encoder-decoder model, this network achieves better performance on sequence to sequence tasks such as machine translation and question answering.

الحساب واللغة التعلم الآلي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد