Do you want to publish a course? Click here

Extracting Topics with Simultaneous Word Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts

استخراج المواضيع مع الرسوم البيانية الوصية المتزامنة واللوانية الدلالية: النمذجة النمذجة العصبية للنصوص القصيرة

275   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Short text nowadays has become a more fashionable form of text data, e.g., Twitter posts, news titles, and product reviews. Extracting semantic topics from short texts plays a significant role in a wide spectrum of NLP applications, and neural topic modeling is now a major tool to achieve it. Motivated by learning more coherent and semantic topics, in this paper we develop a novel neural topic model named Dual Word Graph Topic Model (DWGTM), which extracts topics from simultaneous word co-occurrence and semantic correlation graphs. To be specific, we learn word features from the global word co-occurrence graph, so as to ingest rich word co-occurrence information; we then generate text features with word features, and feed them into an encoder network to get topic proportions per-text; finally, we reconstruct texts and word co-occurrence graph with topical distributions and word features, respectively. Besides, to capture semantics of words, we also apply word features to reconstruct a word semantic correlation graph computed by pre-trained word embeddings. Upon those ideas, we formulate DWGTM in an auto-encoding paradigm and efficiently train it with the spirit of neural variational inference. Empirical results validate that DWGTM can generate more semantically coherent topics than baseline topic models.

References used
https://aclanthology.org/
rate research

Read More

We suggest to model human-annotated Word Usage Graphs capturing fine-grained semantic proximity distinctions between word uses with a Bayesian formulation of the Weighted Stochastic Block Model, a generative model for random graphs popular in biology , physics and social sciences. By providing a probabilistic model of graded word meaning we aim to approach the slippery and yet widely used notion of word sense in a novel way. The proposed framework enables us to rigorously compare models of word senses with respect to their fit to the data. We perform extensive experiments and select the empirically most adequate model.
Most recent studies for relation extraction (RE) leverage the dependency tree of the input sentence to incorporate syntax-driven contextual information to improve model performance, with little attention paid to the limitation where high-quality depe ndency parsers in most cases unavailable, especially for in-domain scenarios. To address this limitation, in this paper, we propose attentive graph convolutional networks (A-GCN) to improve neural RE methods with an unsupervised manner to build the context graph, without relying on the existence of a dependency parser. Specifically, we construct the graph from n-grams extracted from a lexicon built from pointwise mutual information (PMI) and apply attention over the graph. Therefore, different word pairs from the contexts within and across n-grams are weighted in the model and facilitate RE accordingly. Experimental results with further analyses on two English benchmark datasets for RE demonstrate the effectiveness of our approach, where state-of-the-art performance is observed on both datasets.
Topic models are useful tools for analyzing and interpreting the main underlying themes of large corpora of text. Most topic models rely on word co-occurrence for computing a topic, i.e., a weighted set of words that together represent a high-level s emantic concept. In this paper, we propose a new light-weight Self-Supervised Neural Topic Model (SNTM) that learns a rich context by learning a topic representation jointly from three co-occurring words and a document that the triple originates from. Our experimental results indicate that our proposed neural topic model, SNTM, outperforms previously existing topic models in coherence metrics as well as document clustering accuracy. Moreover, apart from the topic coherence and clustering performance, the proposed neural topic model has a number of advantages, namely, being computationally efficient and easy to train.
Text generation from semantic graphs is traditionally performed with deterministic methods, which generate a unique description given an input graph. However, the generation problem admits a range of acceptable textual outputs, exhibiting lexical, sy ntactic and semantic variation. To address this disconnect, we present two main contributions. First, we propose a stochastic graph-to-text model, incorporating a latent variable in an encoder-decoder model, and its use in an ensemble. Second, to assess the diversity of the generated sentences, we propose a new automatic evaluation metric which jointly evaluates output diversity and quality in a multi-reference setting. We evaluate the models on WebNLG datasets in English and Russian, and show an ensemble of stochastic models produces diverse sets of generated sentences while, retaining similar quality to state-of-the-art models.
Neural topic models can augment or replace bag-of-words inputs with the learned representations of deep pre-trained transformer-based word prediction models. One added benefit when using representations from multilingual models is that they facilitat e zero-shot polylingual topic modeling. However, while it has been widely observed that pre-trained embeddings should be fine-tuned to a given task, it is not immediately clear what supervision should look like for an unsupervised task such as topic modeling. Thus, we propose several methods for fine-tuning encoders to improve both monolingual and zero-shot polylingual neural topic modeling. We consider fine-tuning on auxiliary tasks, constructing a new topic classification task, integrating the topic classification objective directly into topic model training, and continued pre-training. We find that fine-tuning encoder representations on topic classification and integrating the topic classification task directly into topic modeling improves topic quality, and that fine-tuning encoder representations on any task is the most important factor for facilitating cross-lingual transfer.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا