Pattern Discovery in Time Series with Byte Pair Encoding

70 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Nazgol Tavabi

تاريخ النشر 2021

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Nazgol Tavabi - Kristina Lerman

معالجة الإشارات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The growing popularity of wearable sensors has generated large quantities of temporal physiological and activity data. Ability to analyze this data offers new opportunities for real-time health monitoring and forecasting. However, temporal physiological data presents many analytic challenges: the data is noisy, contains many missing values, and each series has a different length. Most methods proposed for time series analysis and classification do not handle datasets with these characteristics nor do they offer interpretability and explainability, a critical requirement in the health domain. We propose an unsupervised method for learning representations of time series based on common patterns identified within them. The patterns are, interpretable, variable in length, and extracted using Byte Pair Encoding compression technique. In this way the method can capture both long-term and short-term dependencies present in the data. We show that this method applies to both univariate and multivariate time series and beats state-of-the-art approaches on a real world dataset collected from wearable sensors.

قيم البحث

53 - Tongge Huang , Pranamesh Chakraborty , Anuj Sharma 2020

Sufficient high-quality traffic data are a crucial component of various Intelligent Transportation System (ITS) applications and research related to congestion prediction, speed prediction, incident detection, and other traffic operation tasks. Nonet heless, missing traffic data are a common issue in sensor data which is inevitable due to several reasons, such as malfunctioning, poor maintenance or calibration, and intermittent communications. Such missing data issues often make data analysis and decision-making complicated and challenging. In this study, we have developed a generative adversarial network (GAN) based traffic sensor data imputation framework (TSDIGAN) to efficiently reconstruct the missing data by generating realistic synthetic data. In recent years, GANs have shown impressive success in image data generation. However, generating traffic data by taking advantage of GAN based modeling is a challenging task, since traffic data have strong time dependency. To address this problem, we propose a novel time-dependent encoding method called the Gramian Angular Summation Field (GASF) that converts the problem of traffic time-series data generation into that of image generation. We have evaluated and tested our proposed model using the benchmark dataset provided by Caltrans Performance Management Systems (PeMS). This study shows that the proposed model can significantly improve the traffic data imputation accuracy in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to state-of-the-art models on the benchmark dataset. Further, the model achieves reasonably high accuracy in imputation tasks even under a very high missing data rate ($>$ 50%), which shows the robustness and efficiency of the proposed model.

معالجة الإشارات التعلم الآلي

Probabilistic structure discovery in time series data

75 - David Janz , Brooks Paige , Tom Rainforth 2016

Existing methods for structure discovery in time series data construct interpretable, compositional kernels for Gaussian process regression models. While the learned Gaussian process model provides posterior mean and variance estimates, typically the structure is learned via a greedy optimization procedure. This restricts the space of possible solutions and leads to over-confident uncertainty estimates. We introduce a fully Bayesian approach, inferring a full posterior over structures, which more reliably captures the uncertainty of the model.

التعلم الالي التعلم الآلي

Extraction of instantaneous frequencies and amplitudes in nonstationary time-series data

83 - Daniel E. Shea , Rajiv Giridharagopal , David S. Ginger 2021

Time-series analysis is critical for a diversity of applications in science and engineering. By leveraging the strengths of modern gradient descent algorithms, the Fourier transform, multi-resolution analysis, and Bayesian spectral analysis, we propo se a data-driven approach to time-frequency analysis that circumvents many of the shortcomings of classic approaches, including the extraction of nonstationary signals with discontinuities in their behavior. The method introduced is equivalent to a {em nonstationary Fourier mode decomposition} (NFMD) for nonstationary and nonlinear temporal signals, allowing for the accurate identification of instantaneous frequencies and their amplitudes. The method is demonstrated on a diversity of time-series data, including on data from cantilever-based electrostatic force microscopy to quantify the time-dependent evolution of charging dynamics at the nanoscale.

معالجة الإشارات التعلم الآلي التحليل العددي

Entropy-based Discovery of Summary Causal Graphs in Time Series

58 - Karim Assaad , Emilie Devijver , Eric Gaussier 2021

We address in this study the problem of learning a summary causal graph on time series with potentially different sampling rates. To do so, we first propose a new temporal mutual information measure defined on a window-based representation of time se ries. We then show how this measure relates to an entropy reduction principle that can be seen as a special case of the Probabilistic Raising Principle. We finally combine these two ingredients in a PC-like algorithm to construct the summary causal graph. This algorithm is evaluated on several datasets that shows both its efficacy and efficiency.

الذكاء الاصطناعي التعلم الآلي

Sampling and Reconstruction of Bandlimited Signals with Multi-Channel Time Encoding

62 - Karen Adam , Adam Scholefield , Martin Vetterli 2019

Sampling is classically performed by recording the amplitude of an input signal at given time instants; however, sampling and reconstructing a signal using multiple devices in parallel becomes a more difficult problem to solve when the devices have a n unknown shift in their clocks. Alternatively, one can record the times at which a signal (or its integral) crosses given thresholds. This can model integrate-and-fire neurons, for example, and has been studied by Lazar and Toth under the name of ``Time Encoding Machines. This sampling method is closer to what is found in nature. In this paper, we show that, when using time encoding machines, reconstruction from multiple channels has a more intuitive solution, and does not require the knowledge of the shifts between machines. We show that, if single-channel time encoding can sample and perfectly reconstruct a $mathbf{2Omega}$-bandlimited signal, then $mathbf{M}$-channel time encoding with shifted integrators can sample and perfectly reconstruct a signal with $mathbf{M}$ times the bandwidth. Furthermore, we present an algorithm to perform this reconstruction and prove that it converges to the correct unique solution, in the noiseless case, without knowledge of the relative shifts between the integrators of the machines. This is quite unlike classical multi-channel sampling, where unknown shifts between sampling devices pose a problem for perfect reconstruction.

معالجة الإشارات