Do you want to publish a course? Click here

RollingLDA: An Update Algorithm of Latent Dirichlet Allocation to Construct Consistent Time Series from Textual Data

Rollinglda: خوارزمية تحديث من مخصصات Dirichlet الكامنة للبناء سلسلة زمنية ثابتة من البيانات النصية

293   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

We propose a rolling version of the Latent Dirichlet Allocation, called RollingLDA. By a sequential approach, it enables the construction of LDA-based time series of topics that are consistent with previous states of LDA models. After an initial modeling, updates can be computed efficiently, allowing for real-time monitoring and detection of events or structural breaks. For this purpose, we propose suitable similarity measures for topics and provide simulation evidence of superiority over other commonly used approaches. The adequacy of the resulting method is illustrated by an application to an example corpus. In particular, we compute the similarity of sequentially obtained topic and word distributions over consecutive time periods. For a representative example corpus consisting of The New York Times articles from 1980 to 2020, we analyze the effect of several tuning parameter choices and we run the RollingLDA method on the full dataset of approximately 4 million articles to demonstrate its feasibility.



References used
https://aclanthology.org/
rate research

Read More

Latent Dirichlet allocation (LDA), a widely used topic model, is often employed as a fundamental tool for text analysis in various applications. However, the training process of the LDA model typically requires massive text corpus data. On one hand, such massive data may expose private information in the training data, thereby incurring significant privacy concerns. On the other hand, the efficiency of the LDA model training may be impacted, since LDA training often needs to handle these massive text corpus data. To address the privacy issues in LDA model training, some recent works have combined LDA training algorithms that are based on collapsed Gibbs sampling (CGS) with differential privacy. Nevertheless, these works usually have a high accumulative privacy budget due to vast iterations in CGS. Moreover, these works always have low efficiency due to handling massive text corpus data. To improve the privacy guarantee and efficiency, we combine a subsampling method with CGS and propose a novel LDA training algorithm with differential privacy, SUB-LDA. We find that subsampling in CGS naturally improves efficiency while amplifying privacy. We propose a novel metric, the efficiency--privacy function, to evaluate improvements of the privacy guarantee and efficiency. Based on a conventional subsampling method, we propose an adaptive subsampling method to improve the model's utility produced by SUB-LDA when the subsampling ratio is small. We provide a comprehensive analysis of SUB-LDA, and the experiment results validate its efficiency and privacy guarantee improvements.
In this paper, we explore the task of automatically generating natural language descriptions of salient patterns in a time series, such as stock prices of a company over a week. A model for this task should be able to extract high-level patterns such as presence of a peak or a dip. While typical contemporary neural models with attention mechanisms can generate fluent output descriptions for this task, they often generate factually incorrect descriptions. We propose a computational model with a truth-conditional architecture which first runs small learned programs on the input time series, then identifies the programs/patterns which hold true for the given input, and finally conditions on *only* the chosen valid program (rather than the input time series) to generate the output text description. A program in our model is constructed from modules, which are small neural networks that are designed to capture numerical patterns and temporal information. The modules are shared across multiple programs, enabling compositionality as well as efficient learning of module parameters. The modules, as well as the composition of the modules, are unobserved in data, and we learn them in an end-to-end fashion with the only training signal coming from the accompanying natural language text descriptions. We find that the proposed model is able to generate high-precision captions even though we consider a small and simple space of module types.
There has been a clear and rapid development in signal processing systems, this development comes as a result of the availability of modern techniques in electronic systems and also as a result of achieving mathematical algorithms which were effec tive and perfect for signal processing. One of the most important application in signal processing is the digital image processing techniques. Sampling process is regarded as one of the basic and important operations in signal processing, from which we obtain samples that can represent the original image in perfect way. We present in this essay an affective algorithm which helps to arrange onedimensional samples from two- dimensional samples image. This enables to obtain a series of samples which has an ability of representing images with concern of their general structure. Also the neighborhood correlation of image points is respected, in addition to carrying out the subsequent treatments with less mathematical cost.
As it’s known, The Graph k-Colorability Problem (GCP) is a wellknown NP-Hard Problem. This problem consists in finding the k minimum number of colors to paint the vertices of a graph in such a way that any two adjoined vertices, which are connecte d by an edge, have always different colors. In another words how can we color the edges of a graph in such a way that any two edges joined by a vertex have always different colors? In this paper we introduce a new effective algorithm for coloring the edges of the graph. Our proposed algorithm enables us to achieve a Continuously Edge Coloring (CEC) for a set of known graphs.
This paper introduces a new algorithm to solve some problems that data clustering algorithms such as K-Means suffer from. This new algorithm by itself is able to cluster data without the need of other clustering algorithms.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا