No Arabic abstract
The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording, and analyzing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 real-world and model-generated time series and over 9000 time-series analysis algorithms are analyzed in this work. We introduce reduced representations of both time series, in terms of their properties measured by diverse scientific methods, and of time-series analysis methods, in terms of their behaviour on empirical time series, and use them to organize these interdisciplinary resources. This new approach to comparing across diverse scientific data and methods allows us to organize time-series datasets automatically according to their properties, retrieve alternatives to particular analysis methods developed in other scientific disciplines, and automate the selection of useful methods for time-series classification and regression tasks. The broad scientific utility of these tools is demonstrated on datasets of electroencephalograms, self-affine time series, heart beat intervals, speech signals, and others, in each case contributing novel analysis techniques to the existing literature. Highly comparative techniques that compare across an interdisciplinary literature can thus be used to guide more focused research in time-series analysis for applications across the scientific disciplines.
We develop a method for the multifractal characterization of nonstationary time series, which is based on a generalization of the detrended fluctuation analysis (DFA). We relate our multifractal DFA method to the standard partition function-based multifractal formalism, and prove that both approaches are equivalent for stationary signals with compact support. By analyzing several examples we show that the new method can reliably determine the multifractal scaling behavior of time series. By comparing the multifractal DFA results for original series to those for shuffled series we can distinguish multifractality due to long-range correlations from multifractality due to a broad probability density function. We also compare our results with the wavelet transform modulus maxima (WTMM) method, and show that the results are equivalent.
Data series generated by complex systems exhibit fluctuations on many time scales and/or broad distributions of the values. In both equilibrium and non-equilibrium situations, the natural fluctuations are often found to follow a scaling relation over several orders of magnitude, allowing for a characterisation of the data and the generating complex system by fractal (or multifractal) scaling exponents. In addition, fractal and multifractal approaches can be used for modelling time series and deriving predictions regarding extreme events. This review article describes and exemplifies several methods originating from Statistical Physics and Applied Mathematics, which have been used for fractal and multifractal time series analysis.
A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large datasets containing long time series or time series of different lengths. For many of the datasets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using Euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the dataset, insight that can guide further scientific investigation.
The performance of the multifractal detrended analysis on short time series is evaluated for synthetic samples of several mono- and multifractal models. The reconstruction of the generalized Hurst exponents is used to determine the range of applicability of the method and the precision of its results as a function of the decreasing length of the series. As an application the series of the daily exchange rate between the U.S. dollar and the euro is studied.
When dealing with non-stationary systems, for which many time series are available, it is common to divide time in epochs, i.e. smaller time intervals and deal with short time series in the hope to have some form of approximate stationarity on that time scale. We can then study time evolution by looking at properties as a function of the epochs. This leads to singular correlation matrices and thus poor statistics. In the present paper, we propose an ensemble technique to deal with a large set of short time series without any consideration of non-stationarity. We randomly select subsets of time series and thus create an ensemble of non-singular correlation matrices. As the selection possibilities are binomially large, we will obtain good statistics for eigenvalues of correlation matrices, which are typically not independent. Once we defined the ensemble, we analyze its behavior for constant and block-diagonal correlations and compare numerics with analytic results for the corresponding correlated Wishart ensembles. We discuss differences resulting from spurious correlations due to repeatitive use of time-series. The usefulness of this technique should extend beyond the stationary case if, on the time scale of the epochs, we have quasi-stationarity at least for most epochs.