ترغب بنشر مسار تعليمي؟ اضغط هنا

While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work eithe r performs post processing to the inference results of unigram-based topic models, or utilizes complex n-gram-discovery topic models. These methods generally produce low-quality topical phrases or suffer from poor scalability on even moderately-sized datasets. We propose a different approach that is both computationally efficient and effective. Our solution combines a novel phrase mining framework to segment a document into single and multi-word phrases, and a new topic model that operates on the induced document partition. Our approach discovers high quality topical phrases with negligible extra cost to the bag-of-words topic model in a variety of datasets including research publication titles, abstracts, reviews, and news articles.
Automated generation of high-quality topical hierarchies for a text collection is a dream problem in knowledge engineering with many valuable applications. In this paper a scalable and robust algorithm is proposed for constructing a hierarchy of topi cs from a text collection. We divide and conquer the problem using a top-down recursive framework, based on a tensor orthogonal decomposition technique. We solve a critical challenge to perform scalable inference for our newly designed hierarchical topic model. Experiments with various real-world datasets illustrate its ability to generate robust, high-quality hierarchies efficiently. Our method reduces the time of construction by several orders of magnitude, and its robust feature renders it possible for users to interactively revise the hierarchy.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا