Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Application to Topic Modeling of Time- Stamped Documents and apply that for Algorithms

تطبيق لنمذجة موضوع وثائق ذات طابع زمني و تطبيق ذلك على الخوارزميات

2353 0 10 0 ( 0 )

Download Cite

Added by Aِl-Baath University ورقة بحثية

Publication date 2014

and research's language is العربية

Authors مها وهبي( باحث )

Created by Shamra Editor

encoder topic modeling documents latent representations Expectation Maximization Algorithms Maximum Likelihood المرمز التلقائي خوارزمية تعظيم التوقع نمذجة موضوع وثائق تمثيلات كامنة تعظيم الاحتمالات المشتركة

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

نقدم في هذا البحث تطبيق جديد للرسوم البيانية لمعامل الديناميكي( DFGs )، و الذي يختص بنمذجة موضوع، تصنيف النص و استرجاع المعلومات.هذه العوامل الديناميكية مصممة لتشكل متتالية من الوثائق ذات الطابع الزمني. اعتماداً على أساس فن معمارية الترميز التلقائي، يتم تدريب نموذج متعدد الطبقات غير الخطي على مراحل باسلوب حكيم لإنتاج أكثر لتمثيلات مدمجة لحقائب الكلمات عند تسوية وثيقة أو فقرة ، و بهذا يؤدي تحميل دلالي. أنو أيضا ديناميكيات زمنية بسيطة مدمجة على التمثيلات الكامنة ، للإستفادة من البنية الهرمية لسلسلة الوثائق، و يمكن بشكل متزامن إنجاز تصنيفات مراقبة أو الانحدار على عناوين الوثيقة،التي يجعل طريقتنا فريدة من نوعها. تعلم هذا النموذج يتم من خلال تعظيم الإمكانية المشتركة للترميز، فك الترميز،معايير ديناميكية موجهة، و من الممكن استخدام الحد الأعظمي لاستنتاج خلفيي معتمدا على التقريب و الانحدار. يمكننا شرح و تفسير أن تخفيض خسارة الانتروبي الموزونة بين رسومات حوادث الكلمة و اعادة بناءها، يتم بتصغير احتمال نموذج الموضوع، و اظهار أن نموذج موضوعنا يحتوي الاحتمالية الأدنى من توزيعات ديريتشمت الكامنة على أنظمة معالجة المعلومات الطبيعية( Neural Information) ( NIPS Processing Systems ) و حالة مجموعات البيانات المشتركة. لنوضح كيف أن القيود الديناميكية تساعد على التعلم بينما يمكننا و يساعدنا هذا على تصور منحى مسار الموضوع .

We have introduced a new applications for Dynamic Factor Graphs, consisting in topic modeling, text classification and information retrieval. DFGs are tailored here to sequences of time-stamped documents. Based on the auto-encoder architecture, our nonlinear multi-layer model is trained stage-wise to produce increasingly more compact representations of bags-ofwords at the document or paragraph level, thus performing a semantic analysis. It also incorporates simple temporal dynamics on the latent representations, to take advantage of the inherent (hierarchical) structure of sequences of documents, and can simultaneously perform a supervised classification or regression on document labels, which makes our approach unique. Learning this model is done by maximizing the joint likelihood of the encoding, decoding, dynamical and supervised modules, and is possible using an approximate and gradient-based maximum-a-posteriori inference. We demonstrate that by minimizing a weighted cross-entropy loss between his tograms of word occurrences and their reconstruction, we directly minimize the topic model perplexity, and show that our topic model obtains lower perplexity than the Latent Dirichlet Allocation on the NIPS and State of the Union datasets. We illustrate how the dynamical constraints help the learning while enabling to visualize the topic trajectory.

Artificial intelligence review:

Upgrade your account to view the content

Research summary

يقدم هذا البحث تطبيقًا جديدًا للرسوم البيانية للعامل الديناميكي (DFGs) في نمذجة الموضوع، تصنيف النص، واسترجاع المعلومات. يعتمد النموذج على معمارية المرمز التلقائي، حيث يتم تدريب نموذج متعدد الطبقات غير الخطي على مراحل لإنتاج تمثيلات مدمجة لحقائب الكلمات على مستوى الوثيقة أو الفقرة، مما يؤدي إلى تحليل دلالي. يتضمن النموذج أيضًا ديناميكيات زمنية بسيطة على التمثيلات الكامنة، مما يستفيد من البنية الهرمية لسلسلة الوثائق، ويمكنه بشكل متزامن إنجاز تصنيفات مراقبة أو انحدار على عناوين الوثيقة. يتم تعلم النموذج من خلال تعظيم الإمكانية المشتركة للترميز، فك الترميز، المعايير الديناميكية الموجهة، ومن الممكن استخدام الحد الأعظمي للاستنتاج الخلفي المعتمد على التقريب والانحدار. يوضح البحث أن تخفيض خسارة الانتروبى الموزونة بين رسومات حوادث الكلمة وإعادة بنائها يتم بتصغير احتمال نموذج الموضوع، ويظهر أن النموذج يحقق احتمالية أقل من توزيعات ديرينشلت الكامنة على مجموعات بيانات NIPS وحالة الاتحاد. كما يوضح كيف تساعد القيود الديناميكية على التعلم وتصور منحى مسار الموضوع.

Critical review

دراسة نقدية: يقدم البحث مساهمة قيمة في مجال نمذجة الموضوع واسترجاع المعلومات باستخدام الرسوم البيانية للعامل الديناميكي والمرمزات التلقائية. ومع ذلك، يمكن أن تكون هناك بعض النقاط التي تحتاج إلى مزيد من التوضيح أو التحسين. على سبيل المثال، قد يكون من المفيد تقديم مزيد من التفاصيل حول كيفية اختيار المعلمات المختلفة للنموذج وكيفية تأثيرها على الأداء. بالإضافة إلى ذلك، قد يكون من المفيد تقديم مقارنة أكثر تفصيلية مع النماذج الأخرى الموجودة في الأدبيات لتوضيح الفوائد الفعلية للنموذج المقترح. وأخيرًا، يمكن أن يكون هناك اهتمام أكبر بتطبيقات النموذج في مجالات أخرى غير تحليل النصوص، مثل تحليل الصور أو البيانات الزمنية الأخرى.

Questions related to the research

ما هو الهدف الرئيسي من هذا البحث؟

الهدف الرئيسي هو تقديم تطبيق جديد للرسوم البيانية للعامل الديناميكي في نمذجة الموضوع، تصنيف النص، واسترجاع المعلومات.
ما هي التقنية الأساسية المستخدمة في هذا البحث؟

التقنية الأساسية المستخدمة هي المرمز التلقائي متعدد الطبقات غير الخطي.
كيف يتم تعلم النموذج المقترح؟

يتم تعلم النموذج من خلال تعظيم الإمكانية المشتركة للترميز، فك الترميز، المعايير الديناميكية الموجهة، باستخدام الحد الأعظمي للاستنتاج الخلفي المعتمد على التقريب والانحدار.
ما هي الفائدة الرئيسية من استخدام القيود الديناميكية في النموذج؟

القيود الديناميكية تساعد على التعلم وتصور منحى مسار الموضوع، مما يعزز من فهم البنية الهرمية لسلسلة الوثائق.

Keywords

المرمز التلقائي خوارزمية تعظيم التوقع نمذجة الموضوع وثائق تمثيلات كامنة تعظيم الاحتمالات المشتركة

References used

Deerwester, S., Dumais, S., Furnas, G., Landauer, T. and Harshman, R.(1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407

Kolenda, T. and Kai Hansen, L. (2000). Independent components in text. In Advances in Independent Component Analysis

Gehler, P., Holub, A. and Welling, M. (2006). The rate adapting poisson model for information retrieval and object recognition. In ICML

Salakhutdinov, R. and Hinton, G. (2009). Replicated softmax. In ICML

Blei, D., Ng, A. and Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022

rate research

Apply benchmarking method at Syrian banks, and the impact on the quality of banking services

2073 - Syrian Virtual University 2016 رسالة ماجستير

The research aims mainly to study the method of Benchmarking as a mean for continuous improvement of quality and the possibility of its usage in the Syrian banks, and to figure out any obstacles for such application therefore finding the right solutions.

الجودة الخدمات المصرفية المقارنة المرجعية benchmarking المصارف السورية

Topic Modeling for Maternal Health Using Reddit

774 - Association for Computation Linguistics 2021 مقالة

This paper applies topic modeling to understand maternal health topics, concerns, and questions expressed in online communities on social networking sites. We examine Latent Dirichlet Analysis (LDA) and two state-of-the-art methods: neural topic mode l with knowledge distillation (KD) and Embedded Topic Model (ETM) on maternal health texts collected from Reddit. The models are evaluated on topic quality and topic inference, using both auto-evaluation metrics and human assessment. We analyze a disconnect between automatic metrics and human evaluations. While LDA performs the best overall with the auto-evaluation metrics NPMI and Coherence, Neural Topic Model with Knowledge Distillation is favorable by expert evaluation. We also create a new partially expert annotated gold-standard maternal health topic

maternal health latent dirichlet analysis الصحه الذهنيه تحليل Dirichlet كامن صناعة حمض الفوسفور

Fast Mining and Forecasting of Complex Time-Stamped Events

2236 - Damascus University 2018 حلقة بحث

Given a heterogeneous social network, can we forecast its future? Can we predict who will start using a given hashtag on twitter? Can we leverage side information, such as who retweets or follows whom, to improve our membership forecasts? We present TENSORCAST, a novel method that forecasts time-evolving networks more accurately than the current state of the art methods by incorporating multiple data sources in coupled tensors. TENSORCAST is (a) scalable, being linearithmic on the number of connections; (b) effective, achieving over 20% improved precision on top-1000 forecasts of community members; (c) general, being applicable to data sources with a different structure. We run our method on multiple real-world networks, including DBLP and a Twitter temporal network with over 310 million nonzeros, where we predict the evolution of the activity of the use of political hashtags.

التنبؤ Data Mining Forecasting تنسور الأحداث ذات الطابع الزمني تحليل التنسور المواضيع الخفية Time-stamped events Tensor analysis Topic model Latent topics التنقيب عن المعلومات المزيد..

Modeling Disclosive Transparency in NLP Application Descriptions

913 - Association for Computation Linguistics 2021 مقالة

Broader disclosive transparency---truth and clarity in communication regarding the function of AI systems---is widely considered desirable. Unfortunately, it is a nebulous concept, difficult to both define and quantify. This is problematic, as previo us work has demonstrated possible trade-offs and negative consequences to disclosive transparency, such as a confusion effect, where too much information'' clouds a reader's understanding of what a system description means. Disclosive transparency's subjective nature has rendered deep study into these problems and their remedies difficult. To improve this state of affairs, We introduce neural language model-based probabilistic metrics to directly model disclosive transparency, and demonstrate that they correlate with user and expert opinions of system transparency, making them a valid objective proxy. Finally, we demonstrate the use of these metrics in a pilot study quantifying the relationships between transparency, confusion, and user perceptions in a corpus of real NLP system descriptions.

nlp application descriptions nlp application application descriptions أوصاف تطبيق NLP. تطبيق NLP. أوصاف التطبيق صناعة حمض الفوسفور المزيد..

A Cloud-based User-Centered Time-Offset Interaction Application

850 - Association for Computation Linguistics 2021 مقالة

Time-offset interaction applications (TOIA) allow simulating conversations with people who have previously recorded relevant video utterances, which are played in response to their interacting user. TOIAs have great potential for preserving cross-gen erational and cross-cultural histories, online teaching, simulated interviews, etc. Current TOIAs exist in niche contexts involving high production costs. Democratizing TOIA presents different challenges when creating appropriate pre-recordings, designing different user stories, and creating simple online interfaces for experimentation. We open-source TOIA 2.0, a user-centered time-offset interaction application, and make it available for everyone who wants to interact with people's pre-recordings, or create their pre-recordings.

time-offset interaction application cloud-based user-centered time-offset interaction application تطبيق تفاعل أوفست إزاحة الوقت المتمركزة على المستخدمة تطبيق التفاعل صناعة حمض الفوسفور المزيد..

comments

Fetching comments

University of Babylon

Additional details More universities

Application to Topic Modeling of Time- Stamped Documents and apply that for Algorithms

تطبيق لنمذجة موضوع وثائق ذات طابع زمني و تطبيق ذلك على الخوارزميات

Ask ChatGPT about the research

Read More