Scalable and Robust Construction of Topical Hierarchies

387 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Chi Wang

تاريخ النشر 2014

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Chi Wang - Xueqing Liu - Yanglei Song

التعلم الآلي الحساب واللغة قواعد البيانات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Automated generation of high-quality topical hierarchies for a text collection is a dream problem in knowledge engineering with many valuable applications. In this paper a scalable and robust algorithm is proposed for constructing a hierarchy of topics from a text collection. We divide and conquer the problem using a top-down recursive framework, based on a tensor orthogonal decomposition technique. We solve a critical challenge to perform scalable inference for our newly designed hierarchical topic model. Experiments with various real-world datasets illustrate its ability to generate robust, high-quality hierarchies efficiently. Our method reduces the time of construction by several orders of magnitude, and its robust feature renders it possible for users to interactively revise the hierarchy.

قيم البحث

166 - Simon Razniewski 2021

Compiling commonsense knowledge is traditionally an AI topic approached by manual labor. Recent advances in web data processing have enabled automated approaches. In this demonstration we will showcase three systems for automated commonsense knowledg e base construction, highlighting each time one aspect of specific interest to the data management community. (i) We use Quasimodo to illustrate knowledge extraction systems engineering, (ii) Dice to illustrate the role that schema constraints play in cleaning fuzzy commonsense knowledge, and (iii) Ascent to illustrate the relevance of conceptual modelling. The demos are available online at https://quasimodo.r2.enst.fr, https://dice.mpi-inf.mpg.de and ascent.mpi-inf.mpg.de.

الذكاء الاصطناعي الحساب واللغة قواعد البيانات

Reversibility and Composition of Rewriting in Hierarchies

235 - Russ Harmer (Univ Lyon , EnsL , UCBL 2020

In this paper, we study how graph transformations based on sesqui-pushout rewriting can be reversed and how the composition of rewrites can be constructed. We illustrate how such reversibility and composition can be used to design an audit trail syst em for individual graphs and graph hierarchies. This provides us with a compact way to maintain the history of updates of an object, including its multip

المنطق في علوم الحاسوب الذكاء الاصطناعي قواعد البيانات

Scalable Topical Phrase Mining from Text Corpora

446 - Ahmed El-Kishky , Yanglei Song , Chi Wang 2014

While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work eithe r performs post processing to the inference results of unigram-based topic models, or utilizes complex n-gram-discovery topic models. These methods generally produce low-quality topical phrases or suffer from poor scalability on even moderately-sized datasets. We propose a different approach that is both computationally efficient and effective. Our solution combines a novel phrase mining framework to segment a document into single and multi-word phrases, and a new topic model that operates on the induced document partition. Our approach discovers high quality topical phrases with negligible extra cost to the bag-of-words topic model in a variety of datasets including research publication titles, abstracts, reviews, and news articles.

الحساب واللغة استرجاع المعلومات التعلم الآلي

Attributed Sequence Embedding

114 - Zhongfang Zhuang , Xiangnan Kong , Elke Rundensteiner 2019

Mining tasks over sequential data, such as clickstreams and gene sequences, require a careful design of embeddings usable by learning algorithms. Recent research in feature learning has been extended to sequential data, where each instance consists o f a sequence of heterogeneous items with a variable length. However, many real-world applications often involve attributed sequences, where each instance is composed of both a sequence of categorical items and a set of attributes. In this paper, we study this new problem of attributed sequence embedding, where the goal is to learn the representations of attributed sequences in an unsupervised fashion. This problem is core to many important data mining tasks ranging from user behavior analysis to the clustering of gene sequences. This problem is challenging due to the dependencies between sequences and their associated attributes. We propose a deep multimodal learning framework, called NAS, to produce embeddings of attributed sequences. The embeddings are task independent and can be used on various mining tasks of attributed sequences. We demonstrate the effectiveness of our embeddings of attributed sequences in various unsupervised learning tasks on real-world datasets.

التعلم الآلي الحساب واللغة قواعد البيانات

Scalable and Order-robust Continual Learning with Additive Parameter Decomposition

115 - Jaehong Yoon , Saehoon Kim , Eunho Yang 2019

While recent continual learning methods largely alleviate the catastrophic problem on toy-sized datasets, some issues remain to be tackled to apply them to real-world problem domains. First, a continual learning model should effectively handle catast rophic forgetting and be efficient to train even with a large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where the performance of the tasks largely varies based on the order of the task arrival sequence, as it may cause serious problems where fairness plays a critical role (e.g. medical diagnosis). To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameters for each task as a sum of task-shared and sparse task-adaptive parameters. With our Additive Parameter Decomposition (APD), the task-adaptive parameters for earlier tasks remain mostly unaffected, where we update them only to reflect the changes made to the task-shared parameters. This decomposition of parameters effectively prevents catastrophic forgetting and order-sensitivity, while being computation- and memory-efficient. Further, we can achieve even better scalability with APD using hierarchical knowledge consolidation, which clusters the task-adaptive parameters to obtain hierarchically shared parameters. We validate our network with APD, APD-Net, on multiple benchmark datasets against state-of-the-art continual learning methods, which it largely outperforms in accuracy, scalability, and order-robustness.

التعلم الآلي التعلم الالي