Clustering Left-Censored Multivariate Time-Series

185 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Irene Y. Chen

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Irene Y. Chen - Rahul G. Krishnan - David Sontag

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Unsupervised learning seeks to uncover patterns in data. However, different kinds of noise may impede the discovery of useful substructure from real-world time-series data. In this work, we focus on mitigating the interference of left-censorship in the task of clustering. We provide conditions under which clusters and left-censorship may be identified; motivated by this result, we develop a deep generative, continuous-time model of time-series data that clusters while correcting for censorship time. We demonstrate accurate, stable, and interpretable results on synthetic data that outperform several benchmarks. To showcase the utility of our framework on real-world problems, we study how left-censorship can adversely affect the task of disease phenotyping, resulting in the often incorrect assumption that longitudinal patient data are aligned by disease stage. In reality, patients at the time of diagnosis are at different stages of the disease -- both late and early due to differences in when patients seek medical care and such discrepancy can confound unsupervised learning algorithms. On two clinical datasets, our model corrects for this form of censorship and recovers known clinical subtypes.

قيم البحث

119 - Ronak Mehta , Jaewon Chung , Cencheng Shen 2019

Complex data structures such as time series are increasingly present in modern data science problems. A fundamental question is whether two such time-series are statistically dependent. Many current approaches make parametric assumptions on the rando m processes, only detect linear association, require multiple tests, or forfeit power in high-dimensional, nonlinear settings. Estimating the distribution of any test statistic under the null is non-trivial, as the permutation test is invalid. This work juxtaposes distance correlation (Dcorr) and multiscale graph correlation (MGC) from independence testing literature and block permutation from time series analysis to address these challenges. The proposed nonparametric procedure is valid and consistent, building upon prior work by characterizing the geometry of the relationship, estimating the time lag at which dependence is maximized, avoiding the need for multiple testing, and exhibiting superior power in high-dimensional, low sample size, nonlinear settings. Neural connectivity is analyzed via fMRI data, revealing linear dependence of signals within the visual network and default mode network, and nonlinear relationships in other networks. This work uncovers a first-resort data analysis tool with open-source code available, directly impacting a wide range of scientific disciplines.

التعلم الالي التعلم الآلي المنهجية

Multivariate Quantile Bayesian Structural Time Series (MQBSTS) Model

121 - Ning Ning 2020

In this paper, we propose the multivariate quantile Bayesian structural time series (MQBSTS) model for the joint quantile time series forecast, which is the first such model for correlated multivariate time series to the authors best knowledge. The M QBSTS model also enables quantile based feature selection in its regression component where each time series has its own pool of contemporaneous external time series predictors, which is the first time that a fully data-driven quantile feature selection technique applicable to time series data to the authors best knowledge. Different from most machine learning algorithms, the MQBSTS model has very few hyper-parameters to tune, requires small datasets to train, converges fast, and is executable on ordinary personal computers. Extensive examinations on simulated data and empirical data confirmed that the MQBSTS model has superior performance in feature selection, parameter estimation, and forecast.

التعلم الالي التعلم الآلي نظرية الإحصاء

A Deep Structural Model for Analyzing Correlated Multivariate Time Series

180 - Changwei Hu , Yifan Hu , Sungyong Seo 2020

Multivariate time series are routinely encountered in real-world applications, and in many cases, these time series are strongly correlated. In this paper, we present a deep learning structural time series model which can (i) handle correlated multiv ariate time series input, and (ii) forecast the targeted temporal sequence by explicitly learning/extracting the trend, seasonality, and event components. The trend is learned via a 1D and 2D temporal CNN and LSTM hierarchical neural net. The CNN-LSTM architecture can (i) seamlessly leverage the dependency among multiple correlated time series in a natural way, (ii) extract the weighted differencing feature for better trend learning, and (iii) memorize the long-term sequential pattern. The seasonality component is approximated via a non-liner function of a set of Fourier terms, and the event components are learned by a simple linear function of regressor encoding the event dates. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of time series data sets, such as forecasts of Amazon AWS Simple Storage Service (S3) and Elastic Compute Cloud (EC2) billings, and the closing prices for corporate stocks in the same category.

التعلم الالي التعلم الآلي

Multivariate Bayesian Structural Time Series Model

133 - S. Rao Jammalamadaka , Jinwen Qiu , Ning Ning 2018

This paper deals with inference and prediction for multiple correlated time series, where one has also the choice of using a candidate pool of contemporaneous predictors for each target series. Starting with a structural model for the time-series, Ba yesian tools are used for model fitting, prediction, and feature selection, thus extending some recent work along these lines for the univariate case. The Bayesian paradigm in this multivariate setting helps the model avoid overfitting as well as capture correlations among the multiple time series with the various state components. The model provides needed flexibility to choose a different set of components and available predictors for each target series. The cyclical component in the model can handle large variations in the short term, which may be caused by external shocks. We run extensive simulations to investigate properties such as estimation accuracy and performance in forecasting. We then run an empirical study with one-step-ahead prediction on the max log return of a portfolio of stocks that involve four leading financial institutions. Both the simulation studies and the extensive empirical study confirm that this multivariate model outperforms three other benchmark models, viz. a model that treats each target series as independent, the autoregressive integrated moving average model with regression (ARIMAX), and the multivariate ARIMAX (MARIMAX) model.

التعلم الالي

A self-organising eigenspace map for time series clustering

520 - Donya Rahmani , Damien Fay , Jacek Brodzki 2019

This paper presents a novel time series clustering method, the self-organising eigenspace map (SOEM), based on a generalisation of the well-known self-organising feature map (SOFM). The SOEM operates on the eigenspaces of the embedded covariance stru ctures of time series which are related directly to modes in those time series. Approximate joint diagonalisation acts as a pseudo-metric across these spaces allowing us to generalise the SOFM to a neural network with matrix input. The technique is empirically validated against three sets of experiments; univariate and multivariate time series clustering, and application to (clustered) multi-variate time series forecasting. Results indicate that the technique performs a valid topologically ordered clustering of the time series. The clustering is superior in comparison to standard benchmarks when the data is non-aligned, gives the best clustering stage for when used in forecasting, and can be used with partial/non-overlapping time series, multivariate clustering and produces a topological representation of the time series objects.

التعلم الالي التعلم الآلي تطبيقات الإحصاء