New community

Subscribe to the gold package and get unlimited access to Shamra Academy

RollingLDA: An Update Algorithm of Latent Dirichlet Allocation to Construct Consistent Time Series from Textual Data

Rollinglda: خوارزمية تحديث من مخصصات Dirichlet الكامنة للبناء سلسلة زمنية ثابتة من البيانات النصية

303 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

طريقة حقن بوابات textual data construct consistent time البيانات النصية بناء وقت ثابت صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We propose a rolling version of the Latent Dirichlet Allocation, called RollingLDA. By a sequential approach, it enables the construction of LDA-based time series of topics that are consistent with previous states of LDA models. After an initial modeling, updates can be computed efficiently, allowing for real-time monitoring and detection of events or structural breaks. For this purpose, we propose suitable similarity measures for topics and provide simulation evidence of superiority over other commonly used approaches. The adequacy of the resulting method is illustrated by an application to an example corpus. In particular, we compute the similarity of sequentially obtained topic and word distributions over consecutive time periods. For a representative example corpus consisting of The New York Times articles from 1980 to 2020, we analyze the effect of several tuning parameter choices and we run the RollingLDA method on the full dataset of approximately 4 million articles to demonstrate its feasibility.

References used

https://aclanthology.org/

rate research

Improving Privacy Guarantee and Efficiency of Latent Dirichlet Allocation Model Training Under Differential Privacy

637 - Association for Computation Linguistics 2021 مقالة

Latent Dirichlet allocation (LDA), a widely used topic model, is often employed as a fundamental tool for text analysis in various applications. However, the training process of the LDA model typically requires massive text corpus data. On one hand, such massive data may expose private information in the training data, thereby incurring significant privacy concerns. On the other hand, the efficiency of the LDA model training may be impacted, since LDA training often needs to handle these massive text corpus data. To address the privacy issues in LDA model training, some recent works have combined LDA training algorithms that are based on collapsed Gibbs sampling (CGS) with differential privacy. Nevertheless, these works usually have a high accumulative privacy budget due to vast iterations in CGS. Moreover, these works always have low efficiency due to handling massive text corpus data. To improve the privacy guarantee and efficiency, we combine a subsampling method with CGS and propose a novel LDA training algorithm with differential privacy, SUB-LDA. We find that subsampling in CGS naturally improves efficiency while amplifying privacy. We propose a novel metric, the efficiency--privacy function, to evaluate improvements of the privacy guarantee and efficiency. Based on a conventional subsampling method, we propose an adaptive subsampling method to improve the model's utility produced by SUB-LDA when the subsampling ratio is small. We provide a comprehensive analysis of SUB-LDA, and the experiment results validate its efficiency and privacy guarantee improvements.

latent dirichlet allocation dirichlet allocation model latent dirichlet تخصيص ديريتشليت الكامنة نموذج تخصيص ديريشيت dirichlet الكامنة صناعة حمض الفوسفور المزيد..

Truth-Conditional Captions for Time Series Data

185 - Association for Computation Linguistics 2021 مقالة

In this paper, we explore the task of automatically generating natural language descriptions of salient patterns in a time series, such as stock prices of a company over a week. A model for this task should be able to extract high-level patterns such as presence of a peak or a dip. While typical contemporary neural models with attention mechanisms can generate fluent output descriptions for this task, they often generate factually incorrect descriptions. We propose a computational model with a truth-conditional architecture which first runs small learned programs on the input time series, then identifies the programs/patterns which hold true for the given input, and finally conditions on *only* the chosen valid program (rather than the input time series) to generate the output text description. A program in our model is constructed from modules, which are small neural networks that are designed to capture numerical patterns and temporal information. The modules are shared across multiple programs, enabling compositionality as well as efficient learning of module parameters. The modules, as well as the composition of the modules, are unobserved in data, and we learn them in an end-to-end fashion with the only training signal coming from the accompanying natural language text descriptions. We find that the proposed model is able to generate high-precision captions even though we consider a small and simple space of module types.

القصص الأخلاقية input time series time series data مسلسل وقت الإدخال بيانات السلاسل الزمنية صناعة حمض الفوسفور

An Effective Algorithm for Arranging Two Dimensional Image Samples Into one Dimensional Series

1268 - Damascus University 2006 ورقة بحثية

There has been a clear and rapid development in signal processing systems, this development comes as a result of the availability of modern techniques in electronic systems and also as a result of achieving mathematical algorithms which were effec tive and perfect for signal processing. One of the most important application in signal processing is the digital image processing techniques. Sampling process is regarded as one of the basic and important operations in signal processing, from which we obtain samples that can represent the original image in perfect way. We present in this essay an affective algorithm which helps to arrange onedimensional samples from two- dimensional samples image. This enables to obtain a series of samples which has an ability of representing images with concern of their general structure. Also the neighborhood correlation of image points is respected, in addition to carrying out the subsequent treatments with less mathematical cost.

Image processing Signal processing Computer Vision معالجـة الإشـارة معالجـة الـصور الرؤية الحاسوبية

An Algorithm for Continuously Edge Coloring a Set of Graphs

1666 - Aِl-Baath University 2016 ورقة بحثية

As it’s known, The Graph k-Colorability Problem (GCP) is a wellknown NP-Hard Problem. This problem consists in finding the k minimum number of colors to paint the vertices of a graph in such a way that any two adjoined vertices, which are connecte d by an edge, have always different colors. In another words how can we color the edges of a graph in such a way that any two edges joined by a vertex have always different colors? In this paper we introduce a new effective algorithm for coloring the edges of the graph. Our proposed algorithm enables us to achieve a Continuously Edge Coloring (CEC) for a set of known graphs.

البيان Graph مسألة تلوين البيان التلوين الضلعي خوارزمية تلوين بيان التلوين الضلعي المستمر Graph Coloring Problem Edge Coloring Graph Graph Coloring Algorithm Continuously Edge Coloring المزيد..

A New Algorithm for Data Clustering and Enhancing K-Means Algorithm

3156 - Aِl-Baath University 2016 ورقة بحثية

This paper introduces a new algorithm to solve some problems that data clustering algorithms such as K-Means suffer from. This new algorithm by itself is able to cluster data without the need of other clustering algorithms.

العنقدة Clustering Centroid المركز Data Mining البيانات الكائنات النقاط المنعزلة مربع الخطأ Objects Isolated Points Square Error المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

RollingLDA: An Update Algorithm of Latent Dirichlet Allocation to Construct Consistent Time Series from Textual Data

Rollinglda: خوارزمية تحديث من مخصصات Dirichlet الكامنة للبناء سلسلة زمنية ثابتة من البيانات النصية

Ask ChatGPT about the research

Read More

suggested questions