أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Jie Lu

Dependent Indian Buffet Process-based Sparse Nonparametric Nonnegative Matrix Factorization

52 - Junyu Xuan , Jie Lu , Guangquan Zhang 2015

Nonnegative Matrix Factorization (NMF) aims to factorize a matrix into two optimized nonnegative matrices appropriate for the intended applications. The method has been widely used for unsupervised learning tasks, including recommender systems (ratin g matrix of users by items) and document clustering (weighting matrix of papers by keywords). However, traditional NMF methods typically assume the number of latent factors (i.e., dimensionality of the loading matrices) to be fixed. This assumption makes them inflexible for many applications. In this paper, we propose a nonparametric NMF framework to mitigate this issue by using dependent Indian Buffet Processes (dIBP). In a nutshell, we apply a correlation function for the generation of two stick weights associated with each pair of columns of loading matrices, while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two loading matrices will be column-wise (indirectly) correlated. Under this same framework, two classes of correlation function are proposed (1) using Bivariate beta distribution and (2) using Copula function. Both methods allow us to adopt our work for various applications by flexibly choosing an appropriate parameter settings. Compared with the other state-of-the art approaches in this area, such as using Gaussian Process (GP)-based dIBP, our work is seen to be much more flexible in terms of allowing the two corresponding binary matrix columns to have greater variations in their non-zero entries. Our experiments on the real-world and synthetic datasets show that three proposed models perform well on the document clustering task comparing standard NMF without predefining the dimension for the factor matrices, and the Bivariate beta distribution-based and Copula-based models have better flexibility than the GP-based model.

التعلم الالي

Nonnegative Multi-level Network Factorization for Latent Factor Analysis

130 - Junyu Xuan , Jie Lu , Xiangfeng Luo 2015

Nonnegative Matrix Factorization (NMF) aims to factorize a matrix into two optimized nonnegative matrices and has been widely used for unsupervised learning tasks such as product recommendation based on a rating matrix. However, although networks bet ween nodes with the same nature exist, standard NMF overlooks them, e.g., the social network between users. This problem leads to comparatively low recommendation accuracy because these networks are also reflections of the nature of the nodes, such as the preferences of users in a social network. Also, social networks, as complex networks, have many different structures. Each structure is a composition of links between nodes and reflects the nature of nodes, so retaining the different network structures will lead to differences in recommendation performance. To investigate the impact of these network structures on the factorization, this paper proposes four multi-level network factorization algorithms based on the standard NMF, which integrates the vertical network (e.g., rating matrix) with the structures of horizontal network (e.g., user social network). These algorithms are carefully designed with corresponding convergence proofs to retain four desired network structures. Experiments on synthetic data show that the proposed algorithms are able to preserve the desired network structures as designed. Experiments on real-world data show that considering the horizontal networks improves the accuracy of document clustering and recommendation with standard NMF, and various structures show their differences in performance on these two tasks. These results can be directly used in document clustering and recommendation systems.

الشبكات الاجتماعية والمعلومات استرجاع المعلومات

Nonparametric Relational Topic Models through Dependent Gamma Processes

118 - Junyu Xuan , Jie Lu , Guangquan Zhang 2015

Traditional Relational Topic Models provide a way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, link prediction, benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known in advance, and this is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model in this paper. Instead of using fixed-dimensional probability distributions in its generative model, we use stochastic processes. Specifically, a gamma process is assigned to each document, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different gamma process from the global gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.

التعلم الالي الحساب واللغة استرجاع المعلومات

Infinite Author Topic Model based on Mixed Gamma-Negative Binomial Process

84 - Junyu Xuan , Jie Lu , Guangquan Zhang 2015

Incorporating the side information of text corpus, i.e., authors, time stamps, and emotional tags, into the traditional text mining models has gained significant interests in the area of information retrieval, statistical natural language processing, and machine learning. One branch of these works is the so-called Author Topic Model (ATM), which incorporates the authorss interests as side information into the classical topic model. However, the existing ATM needs to predefine the number of topics, which is difficult and inappropriate in many real-world settings. In this paper, we propose an Infinite Author Topic (IAT) model to resolve this issue. Instead of assigning a discrete probability on fixed number of topics, we use a stochastic process to determine the number of topics from the data itself. To be specific, we extend a gamma-negative binomial process to three levels in order to capture the author-document-keyword hierarchical structure. Furthermore, each document is assigned a mixed gamma process that accounts for the multi-authors contribution towards this document. An efficient Gibbs sampling inference algorithm with each conditional distribution being closed-form is developed for the IAT model. Experiments on several real-world datasets show the capabilities of our IAT model to learn the hidden topics, authors interests on these topics and the number of topics simultaneously.

التعلم الالي استرجاع المعلومات التعلم الآلي

Quantum percolation in quantum spin Hall antidot systems

64 - Rui-Lin Chu , Jie Lu , 2012

We study the influences of antidot-induced bound states on transport properties of two- dimensional quantum spin Hall insulators. The bound statesare found able to induce quantum percolation in the originally insulating bulk. At some critical antidot densities, the quantum spin Hall phase can be completely destroyed due to the maximum quantum percolation. For systems with periodic boundaries, the maximum quantum percolationbetween the bound states creates intermediate extended states in the bulk which is originally gapped and insulating. The antidot in- duced bound states plays the same role as the magnetic field inthe quantum Hall effect, both makes electrons go into cyclotron motions. We also draw an analogy between the quantum percolation phenomena in this system and that in the network models of quantum Hall effect.

الفيزياء ميسكالي وننكالي

Z2 invariant protected bound states in topological insulators

97 - Wen-Yu Shan , Jie Lu , Hai-Zhou Lu 2010

We present an exact solution of a modifed Dirac equation for topological insulator in the presence of a hole or vacancy to demonstrate that vacancies may induce bound states in the band gap of topological insulators. They arise due to the Z_2 classif ication of time-reversal invariant insulators, thus are also topologically-protected like the edge states in the quantum spin Hall effect and the surface states in three-dimensional topological insulators. Coexistence of the in-gap bound states and the edge or surface states in topological insulators suggests that imperfections may affect transport properties of topological insulators via additional bound states near the system boundary.

الفيزياء ميسكالي وننكالي

Surface and Edge States in Topological Semi-metals

119 - Rui-Lin Chu , Wen-Yu Shan , Jie Lu 2010

We study the topologically non-trivial semi-metals by means of the 6-band Kane model. Existence of surface states is explicitly demonstrated by calculating the LDOS on the material surface. In the strain free condition, surface states are divided int o two parts in the energy spectrum, one part is in the direct gap, the other part including the crossing point of surface state Dirac cone is submerged in the valence band. We also show how uni-axial strain induces an insulating band gap and raises the crossing point from the valence band into the band gap, making the system a true topological insulator. We predict existence of helical edge states and spin Hall effect in the thin film topological semi-metals, which could be tested with future experiment. Disorder is found to significantly enhance the spin Hall effect in the valence band of the thin films.

الفيزياء ميسكالي وننكالي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد