تطبق هذه الورقة نمذجة الموضوع لفهم موضوعات صحة الأم والاهتمامات والأسئلة المعبرين عنها في المجتمعات عبر الإنترنت على مواقع الشبكات الاجتماعية.ندرس تحليل Dirichlet الكامن (LDA) وطريقين حديثين: نموذج موضوع عصبي مع تقطير المعرفة (KD) ونموذج الموضوع المدمج (ETM) على نصوص صحة الأم يتم جمعها من Reddit.يتم تقييم النماذج على جودة موضوع الاستدلال والموضوع، باستخدام مقاييس التقييم التلقائي والتقييم البشري.نحن نحلل قطع اتصال بين المقاييس التلقائية والتقييمات البشرية.في حين أن LDA يؤدي الأفضل بشكل عام مع مقاييس التقييم التلقائي NPMI والتماسك، فإن نموذج الموضوع العصبي مع تقطير المعرفة مواتية من خلال تقييم الخبراء.ونحن أيضا إنشاء خبير جديد جزئيا مشروح موضوع صحة الأم
This paper applies topic modeling to understand maternal health topics, concerns, and questions expressed in online communities on social networking sites. We examine Latent Dirichlet Analysis (LDA) and two state-of-the-art methods: neural topic model with knowledge distillation (KD) and Embedded Topic Model (ETM) on maternal health texts collected from Reddit. The models are evaluated on topic quality and topic inference, using both auto-evaluation metrics and human assessment. We analyze a disconnect between automatic metrics and human evaluations. While LDA performs the best overall with the auto-evaluation metrics NPMI and Coherence, Neural Topic Model with Knowledge Distillation is favorable by expert evaluation. We also create a new partially expert annotated gold-standard maternal health topic
References used
https://aclanthology.org/
Though word embeddings and topics are complementary representations, several past works have only used pretrained word embeddings in (neural) topic modeling to address data sparsity in short-text or small collection of documents. This work presents a
Personality and demographics are important variables in social sciences and computational sociolinguistics. However, datasets with both personality and demographic labels are scarce. To address this, we present PANDORA, the first dataset of Reddit co
We have introduced a new applications for Dynamic Factor Graphs, consisting in topic modeling, text classification and information retrieval. DFGs are tailored here to sequences of time-stamped documents.
Based on the auto-encoder architecture, our
Building maintenance is gaining an increasing attention in the field of scientific
research and there was a need for the use of new technologies in maintenance
management, , as the facility management deal with a large amount of information relatin
Skin-to-skin contact (SSC) between the mother and her infant immediately postdelivery
is an important procedure that must be included in the care given to mother and
her infant many health benefits. The mother's desire and reaction towards (SSC) is