Deep topic modeling by multilayer bootstrap network and lasso


Abstract in English

Topic modeling is widely studied for the dimension reduction and analysis of documents. However, it is formulated as a difficult optimization problem. Current approximate solutions also suffer from inaccurate model- or data-assumptions. To deal with the above problems, we propose a polynomial-time deep topic model with no model and data assumptions. Specifically, we first apply multilayer bootstrap network (MBN), which is an unsupervised deep model, to reduce the dimension of documents, and then use the low-dimensional data representations or their clustering results as the target of supervised Lasso for topic word discovery. To our knowledge, this is the first time that MBN and Lasso are applied to unsupervised topic modeling. Experimental comparison results with five representative topic models on the 20-newsgroups and TDT2 corpora illustrate the effectiveness of the proposed algorithm.

Download