No Arabic abstract
We introduce new quantities for exploratory causal inference between bivariate time series. The quantities, called penchants and leanings, are computationally straightforward to apply, follow directly from assumptions of probabilistic causality, do not depend on any assumed models for the time series generating process, and do not rely on any embedding procedures; these features may provide a clearer interpretation of the results than those from existing time series causality tools. The penchant and leaning are computed based on a structured method for computing probabilities.
Linear causal analysis is central to a wide range of important application spanning finance, the physical sciences, and engineering. Much of the existing literature in linear causal analysis operates in the time domain. Unfortunately, the direct application of time domain linear causal analysis to many real-world time series presents three critical challenges: irregular temporal sampling, long range dependencies, and scale. Moreover, real-world data is often collected at irregular time intervals across vast arrays of decentralized sensors and with long range dependencies which make naive time domain correlation estimators spurious. In this paper we present a frequency domain based estimation framework which naturally handles irregularly sampled data and long range dependencies while enabled memory and communication efficient distributed processing of time series data. By operating in the frequency domain we eliminate the need to interpolate and help mitigate the effects of long range dependencies. We implement and evaluate our new work-flow in the distributed setting using Apache Spark and demonstrate on both Monte Carlo simulations and high-frequency financial trading that we can accurately recover causal structure at scale.
Analyzing data from paleoclimate archives such as tree rings or lake sediments offers the opportunity of inferring information on past climate variability. Often, such data sets are univariate and a proper reconstruction of the systems higher-dimensional phase space can be crucial for further analyses. In this study, we systematically compare the methods of time delay embedding and differential embedding for phase space reconstruction. Differential embedding relates the systems higher-dimensional coordinates to the derivatives of the measured time series. For implementation, this requires robust and efficient algorithms to estimate derivatives from noisy and possibly non-uniformly sampled data. For this purpose, we consider several approaches: (i) central differences adapted to irregular sampling, (ii) a generalized version of discrete Legendre coordinates and (iii) the concept of Moving Taylor Bayesian Regression. We evaluate the performance of differential and time delay embedding by studying two paradigmatic model systems - the Lorenz and the Rossler system. More precisely, we compare geometric properties of the reconstructed attractors to those of the original attractors by applying recurrence network analysis. Finally, we demonstrate the potential and the limitations of using the different phase space reconstruction methods in combination with windowed recurrence network analysis for inferring information about past climate variability. This is done by analyzing two well-studied paleoclimate data sets from Ecuador and Mexico. We find that studying the robustness of the results when varying the analysis parameters is an unavoidable step in order to make well-grounded statements on climate variability and to judge whether a data set is suitable for this kind of analysis.
The method of surrogates is one of the key concepts of nonlinear data analysis. Here, we demonstrate that commonly used algorithms for generating surrogates often fail to generate truly linear time series. Rather, they create surrogate realizations with Fourier phase correlations leading to non-detections of nonlinearities. We argue that reliable surrogates can only be generated, if one tests separately for static and dynamic nonlinearities.
Data series generated by complex systems exhibit fluctuations on many time scales and/or broad distributions of the values. In both equilibrium and non-equilibrium situations, the natural fluctuations are often found to follow a scaling relation over several orders of magnitude, allowing for a characterisation of the data and the generating complex system by fractal (or multifractal) scaling exponents. In addition, fractal and multifractal approaches can be used for modelling time series and deriving predictions regarding extreme events. This review article describes and exemplifies several methods originating from Statistical Physics and Applied Mathematics, which have been used for fractal and multifractal time series analysis.
The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording, and analyzing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 real-world and model-generated time series and over 9000 time-series analysis algorithms are analyzed in this work. We introduce reduced representations of both time series, in terms of their properties measured by diverse scientific methods, and of time-series analysis methods, in terms of their behaviour on empirical time series, and use them to organize these interdisciplinary resources. This new approach to comparing across diverse scientific data and methods allows us to organize time-series datasets automatically according to their properties, retrieve alternatives to particular analysis methods developed in other scientific disciplines, and automate the selection of useful methods for time-series classification and regression tasks. The broad scientific utility of these tools is demonstrated on datasets of electroencephalograms, self-affine time series, heart beat intervals, speech signals, and others, in each case contributing novel analysis techniques to the existing literature. Highly comparative techniques that compare across an interdisciplinary literature can thus be used to guide more focused research in time-series analysis for applications across the scientific disciplines.