ترغب بنشر مسار تعليمي؟ اضغط هنا

Scalable and Robust Construction of Topical Hierarchies

169   0   0.0 ( 0 )
 نشر من قبل Chi Wang
 تاريخ النشر 2014
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Automated generation of high-quality topical hierarchies for a text collection is a dream problem in knowledge engineering with many valuable applications. In this paper a scalable and robust algorithm is proposed for constructing a hierarchy of topics from a text collection. We divide and conquer the problem using a top-down recursive framework, based on a tensor orthogonal decomposition technique. We solve a critical challenge to perform scalable inference for our newly designed hierarchical topic model. Experiments with various real-world datasets illustrate its ability to generate robust, high-quality hierarchies efficiently. Our method reduces the time of construction by several orders of magnitude, and its robust feature renders it possible for users to interactively revise the hierarchy.



قيم البحث

اقرأ أيضاً

166 - Simon Razniewski 2021
Compiling commonsense knowledge is traditionally an AI topic approached by manual labor. Recent advances in web data processing have enabled automated approaches. In this demonstration we will showcase three systems for automated commonsense knowledg e base construction, highlighting each time one aspect of specific interest to the data management community. (i) We use Quasimodo to illustrate knowledge extraction systems engineering, (ii) Dice to illustrate the role that schema constraints play in cleaning fuzzy commonsense knowledge, and (iii) Ascent to illustrate the relevance of conceptual modelling. The demos are available online at https://quasimodo.r2.enst.fr, https://dice.mpi-inf.mpg.de and ascent.mpi-inf.mpg.de.
In this paper, we study how graph transformations based on sesqui-pushout rewriting can be reversed and how the composition of rewrites can be constructed. We illustrate how such reversibility and composition can be used to design an audit trail syst em for individual graphs and graph hierarchies. This provides us with a compact way to maintain the history of updates of an object, including its multip
While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work eithe r performs post processing to the inference results of unigram-based topic models, or utilizes complex n-gram-discovery topic models. These methods generally produce low-quality topical phrases or suffer from poor scalability on even moderately-sized datasets. We propose a different approach that is both computationally efficient and effective. Our solution combines a novel phrase mining framework to segment a document into single and multi-word phrases, and a new topic model that operates on the induced document partition. Our approach discovers high quality topical phrases with negligible extra cost to the bag-of-words topic model in a variety of datasets including research publication titles, abstracts, reviews, and news articles.
Mining tasks over sequential data, such as clickstreams and gene sequences, require a careful design of embeddings usable by learning algorithms. Recent research in feature learning has been extended to sequential data, where each instance consists o f a sequence of heterogeneous items with a variable length. However, many real-world applications often involve attributed sequences, where each instance is composed of both a sequence of categorical items and a set of attributes. In this paper, we study this new problem of attributed sequence embedding, where the goal is to learn the representations of attributed sequences in an unsupervised fashion. This problem is core to many important data mining tasks ranging from user behavior analysis to the clustering of gene sequences. This problem is challenging due to the dependencies between sequences and their associated attributes. We propose a deep multimodal learning framework, called NAS, to produce embeddings of attributed sequences. The embeddings are task independent and can be used on various mining tasks of attributed sequences. We demonstrate the effectiveness of our embeddings of attributed sequences in various unsupervised learning tasks on real-world datasets.
While recent continual learning methods largely alleviate the catastrophic problem on toy-sized datasets, some issues remain to be tackled to apply them to real-world problem domains. First, a continual learning model should effectively handle catast rophic forgetting and be efficient to train even with a large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where the performance of the tasks largely varies based on the order of the task arrival sequence, as it may cause serious problems where fairness plays a critical role (e.g. medical diagnosis). To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameters for each task as a sum of task-shared and sparse task-adaptive parameters. With our Additive Parameter Decomposition (APD), the task-adaptive parameters for earlier tasks remain mostly unaffected, where we update them only to reflect the changes made to the task-shared parameters. This decomposition of parameters effectively prevents catastrophic forgetting and order-sensitivity, while being computation- and memory-efficient. Further, we can achieve even better scalability with APD using hierarchical knowledge consolidation, which clusters the task-adaptive parameters to obtain hierarchically shared parameters. We validate our network with APD, APD-Net, on multiple benchmark datasets against state-of-the-art continual learning methods, which it largely outperforms in accuracy, scalability, and order-robustness.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا