Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Multi-source Neural Topic Modeling in Multi-view Embedding Spaces

نمذجة موضوع عصبي متعدد المصدر في مساحات تضمين متعددة الرؤية

293 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

على الرغم من أن Word Adgeddings والمواضيع هي تمثيل تكميلي، إلا أن العديد من الأعمال السابقة استخدمت فقط Arestrained Word Areging في النمذجة الموضوعية (العصبية) لمعالجة Sparsity البيانات في نص قصير أو مجموعة صغيرة من المستندات. يعرض هذا العمل إطارا للنمذجة النمذجة العصبية الرواية باستخدام مساحات تضمين متعددة الرؤية: (1) - Arbrained Topic-Embeddings، و (2) - Ardrained Word-Argeddings (غير حساس للسياق من القفازات والسياق الحساسة من نماذج بيرت) بالاشتراك من واحد أو العديد من المصادر لتحسين جودة الموضوع والتعامل بشكل أفضل مع Polysemy. عند القيام بذلك، نقوم أولا بإنشاء حمامات متعصفة من الموضوع المسبق (I.E.، TopicPool) و Adgeddings Word (I.E.، WordPool). بعد ذلك، حددنا واحدا أو أكثر من المجال (المجال) المصدر (SOB) ونقل المعرفة لتوجيه التعلم الهادف في المجال المستهدف Sparse. ضمن النمذجة الموضوعية العصبية، نحدد جودة المواضيع وتمثيلات المستند عبر التعميم (الحيرة)، إمكانية الترجمة الترجمة الترجمة الشفوية (تماسك الموضوع) واسترجاع المعلومات (IR) باستخدام مجموعات مستندات قصيرة ونص وطويلة وصغيرة من الأخبار والمجالات الطبية وبعد تقديم مساحات تضمين متعددة المشتريات متعددة المصدر، وقد أظهرنا نمذجة موضوع عصبي للحالة باستخدام 6 مصدر (الموارد العالية) و 5 أهداف (الموارد المنخفضة).

Though word embeddings and topics are complementary representations, several past works have only used pretrained word embeddings in (neural) topic modeling to address data sparsity in short-text or small collection of documents. This work presents a novel neural topic modeling framework using multi-view embed ding spaces: (1) pretrained topic-embeddings, and (2) pretrained word-embeddings (context-insensitive from Glove and context-sensitive from BERT models) jointly from one or many sources to improve topic quality and better deal with polysemy. In doing so, we first build respective pools of pretrained topic (i.e., TopicPool) and word embeddings (i.e., WordPool). We then identify one or more relevant source domain(s) and transfer knowledge to guide meaningful learning in the sparse target domain. Within neural topic modeling, we quantify the quality of topics and document representations via generalization (perplexity), interpretability (topic coherence) and information retrieval (IR) using short-text, long-text, small and large document collections from news and medical domains. Introducing the multi-source multi-view embedding spaces, we have shown state-of-the-art neural topic modeling using 6 source (high-resource) and 5 target (low-resource) corpora.

References used

https://aclanthology.org/

rate research

Topic Modeling for Maternal Health Using Reddit

330 - Association for Computation Linguistics 2021 مقالة

This paper applies topic modeling to understand maternal health topics, concerns, and questions expressed in online communities on social networking sites. We examine Latent Dirichlet Analysis (LDA) and two state-of-the-art methods: neural topic mode l with knowledge distillation (KD) and Embedded Topic Model (ETM) on maternal health texts collected from Reddit. The models are evaluated on topic quality and topic inference, using both auto-evaluation metrics and human assessment. We analyze a disconnect between automatic metrics and human evaluations. While LDA performs the best overall with the auto-evaluation metrics NPMI and Coherence, Neural Topic Model with Knowledge Distillation is favorable by expert evaluation. We also create a new partially expert annotated gold-standard maternal health topic

maternal health latent dirichlet analysis الصحه الذهنيه تحليل Dirichlet كامن صناعة حمض الفوسفور

Generating Mammography Reports from Multi-view Mammograms with BERT

415 - Association for Computation Linguistics 2021 مقالة

Writing mammography reports can be error-prone and time-consuming for radiologists. In this paper we propose a method to generate mammography reports given four images, corresponding to the four views used in screening mammography. To the best of our knowledge our work represents the first attempt to generate the mammography report using deep-learning. We propose an encoder-decoder model that includes an EfficientNet-based encoder and a Transformer-based decoder. We demonstrate that the Transformer-based attention mechanism can combine visual and semantic information to localize salient regions on the input mammograms and generate a visually interpretable report. The conducted experiments, including an evaluation by a certified radiologist, show the effectiveness of the proposed method.

generating mammography reports multi-view mammograms mammography reports توليد تقارير التصوير بالثدي تصوير الثدي بالأشعة السينية متعددة تقارير التصوير الشعاعي للثدي صناعة حمض الفوسفور المزيد..

Multiple Captions Embellished Multilingual Multi-Modal Neural Machine Translation

490 - Association for Computation Linguistics 2021 مقالة

Neural machine translation based on bilingual text with limited training data suffers from lexical diversity, which lowers the rare word translation accuracy and reduces the generalizability of the translation system. In this work, we utilise the mul tiple captions from the Multi-30K dataset to increase the lexical diversity aided with the cross-lingual transfer of information among the languages in a multilingual setup. In this multilingual and multimodal setting, the inclusion of the visual features boosts the translation quality by a significant margin. Empirical study affirms that our proposed multimodal approach achieves substantial gain in terms of the automatic score and shows robustness in handling the rare word translation in the pretext of English to/from Hindi and Telugu translation tasks.

التدريب عبر اللغات embellished multilingual multi-modal multi-modal neural machine منمق متعدد اللغات متعددة الوسائط متعددة مشروط آلة العصبية صناعة حمض الفوسفور

More is Better: Enhancing Open-Domain Dialogue Generation via Multi-Source Heterogeneous Knowledge

359 - Association for Computation Linguistics 2021 مقالة

Despite achieving remarkable performance, previous knowledge-enhanced works usually only use a single-source homogeneous knowledge base of limited knowledge coverage. Thus, they often degenerate into traditional methods because not all dialogues can be linked with knowledge entries. This paper proposes a novel dialogue generation model, MSKE-Dialog, to solve this issue with three unique advantages: (1) Rather than only one, MSKE-Dialog can simultaneously leverage multiple heterogeneous knowledge sources (it includes but is not limited to commonsense knowledge facts, text knowledge, infobox knowledge) to improve the knowledge coverage; (2) To avoid the topic conflict among the context and different knowledge sources, we propose a Multi-Reference Selection to better select context/knowledge; (3) We propose a Multi-Reference Generation to generate informative responses by referring to multiple generation references at the same time. Extensive evaluations on a Chinese dataset show the superior performance of this work against various state-of-the-art approaches. To our best knowledge, this work is the first to use the multi-source heterogeneous knowledge in the open-domain knowledge-enhanced dialogue generation.

enhancing open-domain dialogue enhancing open-domain تعزيز الحوار مفتوح تعزيز المجال المفتوح صناعة حمض الفوسفور

Multi-view Subword Regularization

397 - Association for Computation Linguistics 2021 مقالة

Multilingual pretrained representations generally rely on subword segmentation algorithms to create a shared multilingual vocabulary. However, standard heuristic algorithms often lead to sub-optimal segmentation, especially for languages with limited amounts of data. In this paper, we take two major steps towards alleviating this problem. First, we demonstrate empirically that applying existing subword regularization methods (Kudo, 2018; Provilkov et al., 2020) during fine-tuning of pre-trained multilingual representations improves the effectiveness of cross-lingual transfer. Second, to take full advantage of different possible input segmentations, we propose Multi-view Subword Regularization (MVR), a method that enforces the consistency of predictors between using inputs tokenized by the standard and probabilistic segmentations. Results on the XTREME multilingual benchmark (Hu et al., 2020) show that MVR brings consistent improvements of up to 2.5 points over using standard segmentation algorithms.

multi-view subword regularization subword regularization multi-view subword تنظيم الكلمات الفرعية متعددة المنظر تنظيم الكلمات الفرعية كلمة فرعية متعددة صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Multi-source Neural Topic Modeling in Multi-view Embedding Spaces

نمذجة موضوع عصبي متعدد المصدر في مساحات تضمين متعددة الرؤية

Ask ChatGPT about the research

Read More

suggested questions