ترغب بنشر مسار تعليمي؟ اضغط هنا

Independence Testing for Multivariate Time Series

120   0   0.0 ( 0 )
 نشر من قبل Ronak Mehta
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

Complex data structures such as time series are increasingly present in modern data science problems. A fundamental question is whether two such time-series are statistically dependent. Many current approaches make parametric assumptions on the random processes, only detect linear association, require multiple tests, or forfeit power in high-dimensional, nonlinear settings. Estimating the distribution of any test statistic under the null is non-trivial, as the permutation test is invalid. This work juxtaposes distance correlation (Dcorr) and multiscale graph correlation (MGC) from independence testing literature and block permutation from time series analysis to address these challenges. The proposed nonparametric procedure is valid and consistent, building upon prior work by characterizing the geometry of the relationship, estimating the time lag at which dependence is maximized, avoiding the need for multiple testing, and exhibiting superior power in high-dimensional, low sample size, nonlinear settings. Neural connectivity is analyzed via fMRI data, revealing linear dependence of signals within the visual network and default mode network, and nonlinear relationships in other networks. This work uncovers a first-resort data analysis tool with open-source code available, directly impacting a wide range of scientific disciplines.



قيم البحث

اقرأ أيضاً

80 - Cencheng Shen 2020
A number of universally consistent dependence measures have been recently proposed for testing independence, such as distance correlation, kernel correlation, multiscale graph correlation, etc. They provide a satisfactory solution for dependence test ing in low-dimensions, but often exhibit decreasing power for high-dimensional data, a phenomenon that has been recognized but remains mostly unchartered. In this paper, we aim to better understand the high-dimensional testing scenarios and explore a procedure that is robust against increasing dimension. To that end, we propose the maximum marginal correlation method and characterize high-dimensional dependence structures via the notion of dependent dimensions. We prove that the maximum method can be valid and universally consistent for testing high-dimensional dependence under regularity conditions, and demonstrate when and how the maximum method may outperform other methods. The methodology can be implemented by most existing dependence measures, has a superior testing power in a variety of common high-dimensional settings, and is computationally efficient for big data analysis when using the distance correlation chi-square test.
Unsupervised learning seeks to uncover patterns in data. However, different kinds of noise may impede the discovery of useful substructure from real-world time-series data. In this work, we focus on mitigating the interference of left-censorship in t he task of clustering. We provide conditions under which clusters and left-censorship may be identified; motivated by this result, we develop a deep generative, continuous-time model of time-series data that clusters while correcting for censorship time. We demonstrate accurate, stable, and interpretable results on synthetic data that outperform several benchmarks. To showcase the utility of our framework on real-world problems, we study how left-censorship can adversely affect the task of disease phenotyping, resulting in the often incorrect assumption that longitudinal patient data are aligned by disease stage. In reality, patients at the time of diagnosis are at different stages of the disease -- both late and early due to differences in when patients seek medical care and such discrepancy can confound unsupervised learning algorithms. On two clinical datasets, our model corrects for this form of censorship and recovers known clinical subtypes.
Multivariate time series are routinely encountered in real-world applications, and in many cases, these time series are strongly correlated. In this paper, we present a deep learning structural time series model which can (i) handle correlated multiv ariate time series input, and (ii) forecast the targeted temporal sequence by explicitly learning/extracting the trend, seasonality, and event components. The trend is learned via a 1D and 2D temporal CNN and LSTM hierarchical neural net. The CNN-LSTM architecture can (i) seamlessly leverage the dependency among multiple correlated time series in a natural way, (ii) extract the weighted differencing feature for better trend learning, and (iii) memorize the long-term sequential pattern. The seasonality component is approximated via a non-liner function of a set of Fourier terms, and the event components are learned by a simple linear function of regressor encoding the event dates. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of time series data sets, such as forecasts of Amazon AWS Simple Storage Service (S3) and Elastic Compute Cloud (EC2) billings, and the closing prices for corporate stocks in the same category.
121 - Ning Ning 2020
In this paper, we propose the multivariate quantile Bayesian structural time series (MQBSTS) model for the joint quantile time series forecast, which is the first such model for correlated multivariate time series to the authors best knowledge. The M QBSTS model also enables quantile based feature selection in its regression component where each time series has its own pool of contemporaneous external time series predictors, which is the first time that a fully data-driven quantile feature selection technique applicable to time series data to the authors best knowledge. Different from most machine learning algorithms, the MQBSTS model has very few hyper-parameters to tune, requires small datasets to train, converges fast, and is executable on ordinary personal computers. Extensive examinations on simulated data and empirical data confirmed that the MQBSTS model has superior performance in feature selection, parameter estimation, and forecast.
This paper deals with inference and prediction for multiple correlated time series, where one has also the choice of using a candidate pool of contemporaneous predictors for each target series. Starting with a structural model for the time-series, Ba yesian tools are used for model fitting, prediction, and feature selection, thus extending some recent work along these lines for the univariate case. The Bayesian paradigm in this multivariate setting helps the model avoid overfitting as well as capture correlations among the multiple time series with the various state components. The model provides needed flexibility to choose a different set of components and available predictors for each target series. The cyclical component in the model can handle large variations in the short term, which may be caused by external shocks. We run extensive simulations to investigate properties such as estimation accuracy and performance in forecasting. We then run an empirical study with one-step-ahead prediction on the max log return of a portfolio of stocks that involve four leading financial institutions. Both the simulation studies and the extensive empirical study confirm that this multivariate model outperforms three other benchmark models, viz. a model that treats each target series as independent, the autoregressive integrated moving average model with regression (ARIMAX), and the multivariate ARIMAX (MARIMAX) model.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا