A self-organising eigenspace map for time series clustering

521 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Donya Rahmani

تاريخ النشر 2019

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Donya Rahmani - Damien Fay - Jacek Brodzki

التعلم الالي التعلم الآلي تطبيقات الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper presents a novel time series clustering method, the self-organising eigenspace map (SOEM), based on a generalisation of the well-known self-organising feature map (SOFM). The SOEM operates on the eigenspaces of the embedded covariance structures of time series which are related directly to modes in those time series. Approximate joint diagonalisation acts as a pseudo-metric across these spaces allowing us to generalise the SOFM to a neural network with matrix input. The technique is empirically validated against three sets of experiments; univariate and multivariate time series clustering, and application to (clustered) multi-variate time series forecasting. Results indicate that the technique performs a valid topologically ordered clustering of the time series. The clustering is superior in comparison to standard benchmarks when the data is non-aligned, gives the best clustering stage for when used in forecasting, and can be used with partial/non-overlapping time series, multivariate clustering and produces a topological representation of the time series objects.

قيم البحث

اقرأ أيضاً

Time-series Scenario Forecasting

186 - Sriharsha Veeramachaneni 2012

Many applications require the ability to judge uncertainty of time-series forecasts. Uncertainty is often specified as point-wise error bars around a mean or median forecast. Due to temporal dependencies, such a method obscures some information. We w ould ideally have a way to query the posterior probability of the entire time-series given the predictive variables, or at a minimum, be able to draw samples from this distribution. We use a Bayesian dictionary learning algorithm to statistically generate an ensemble of forecasts. We show that the algorithm performs as well as a physics-based ensemble method for temperature forecasts for Houston. We conclude that the method shows promise for scenario forecasting where physics-based methods are absent.

التعلم الالي التعلم الآلي تطبيقات الإحصاء

Clustering Left-Censored Multivariate Time-Series

184 - Irene Y. Chen , Rahul G. Krishnan , David Sontag 2021

Unsupervised learning seeks to uncover patterns in data. However, different kinds of noise may impede the discovery of useful substructure from real-world time-series data. In this work, we focus on mitigating the interference of left-censorship in t he task of clustering. We provide conditions under which clusters and left-censorship may be identified; motivated by this result, we develop a deep generative, continuous-time model of time-series data that clusters while correcting for censorship time. We demonstrate accurate, stable, and interpretable results on synthetic data that outperform several benchmarks. To showcase the utility of our framework on real-world problems, we study how left-censorship can adversely affect the task of disease phenotyping, resulting in the often incorrect assumption that longitudinal patient data are aligned by disease stage. In reality, patients at the time of diagnosis are at different stages of the disease -- both late and early due to differences in when patients seek medical care and such discrepancy can confound unsupervised learning algorithms. On two clinical datasets, our model corrects for this form of censorship and recovers known clinical subtypes.

التعلم الالي التعلم الآلي

Iterative Spectral Method for Alternative Clustering

447 - Chieh Wu , Stratis Ioannidis , Mario Sznaier 2019

Given a dataset and an existing clustering as input, alternative clustering aims to find an alternative partition. One of the state-of-the-art approaches is Kernel Dimension Alternative Clustering (KDAC). We propose a novel Iterative Spectral Method (ISM) that greatly improves the scalability of KDAC. Our algorithm is intuitive, relies on easily implementable spectral decompositions, and comes with theoretical guarantees. Its computation time improves upon existing implementations of KDAC by as much as 5 orders of magnitude.

التعلم الالي التعلم الآلي تطبيقات الإحصاء

Independence Testing for Multivariate Time Series

119 - Ronak Mehta , Jaewon Chung , Cencheng Shen 2019

Complex data structures such as time series are increasingly present in modern data science problems. A fundamental question is whether two such time-series are statistically dependent. Many current approaches make parametric assumptions on the rando m processes, only detect linear association, require multiple tests, or forfeit power in high-dimensional, nonlinear settings. Estimating the distribution of any test statistic under the null is non-trivial, as the permutation test is invalid. This work juxtaposes distance correlation (Dcorr) and multiscale graph correlation (MGC) from independence testing literature and block permutation from time series analysis to address these challenges. The proposed nonparametric procedure is valid and consistent, building upon prior work by characterizing the geometry of the relationship, estimating the time lag at which dependence is maximized, avoiding the need for multiple testing, and exhibiting superior power in high-dimensional, low sample size, nonlinear settings. Neural connectivity is analyzed via fMRI data, revealing linear dependence of signals within the visual network and default mode network, and nonlinear relationships in other networks. This work uncovers a first-resort data analysis tool with open-source code available, directly impacting a wide range of scientific disciplines.

التعلم الالي التعلم الآلي المنهجية

Functional optimal transport: map estimation and domain adaptation for functional data

138 - Jiacheng Zhu , Aritra Guha , Dat Do 2021

We introduce a formulation of optimal transport problem for distributions on function spaces, where the stochastic map between functional domains can be partially represented in terms of an (infinite-dimensional) Hilbert-Schmidt operator mapping a Hi lbert space of functions to another. For numerous machine learning tasks, data can be naturally viewed as samples drawn from spaces of functions, such as curves and surfaces, in high dimensions. Optimal transport for functional data analysis provides a useful framework of treatment for such domains. In this work, we develop an efficient algorithm for finding the stochastic transport map between functional domains and provide theoretical guarantees on the existence, uniqueness, and consistency of our estimate for the Hilbert-Schmidt operator. We validate our method on synthetic datasets and study the geometric properties of the transport map. Experiments on real-world datasets of robot arm trajectories further demonstrate the effectiveness of our method on applications in domain adaptation.

التعلم الالي التعلم الآلي تطبيقات الإحصاء