AutoML Meets Time Series Regression Design and Analysis of the AutoSeries Challenge

371 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhen Xu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhen Xu - Wei-Wei Tu - Isabelle Guyon

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Analyzing better time series with limited human effort is of interest to academia and industry. Driven by business scenarios, we organized the first Automated Time Series Regression challenge (AutoSeries) for the WSDM Cup 2020. We present its design, analysis, and post-hoc experiments. The code submission requirement precluded participants from any manual intervention, testing automated machine learning capabilities of solutions, across many datasets, under hardware and time limitations. We prepared 10 datasets from diverse application domains (sales, power consumption, air quality, traffic, and parking), featuring missing data, mixed continuous and categorical variables, and various sampling rates. Each dataset was split into a training and a test sequence (which was streamed, allowing models to continuously adapt). The setting of time series regression, differs from classical forecasting in that covariates at the present time are known. Great strides were made by participants to tackle this AutoSeries problem, as demonstrated by the jump in performance from the sample submission, and post-hoc comparisons with AutoGluon. Simple yet effective methods were used, based on feature engineering, LightGBM, and random search hyper-parameter tuning, addressing all aspects of the challenge. Our post-hoc analyses revealed that providing additional time did not yield significant improvements. The winners code was open-sourced https://www.4paradigm.com/competition/autoseries2020.

قيم البحث

204 - Hugo Jair Escalante , Wei-Wei Tu , Isabelle Guyon 2019

We organized a competition on Autonomous Lifelong Machine Learning with Drift that was part of the competition program of NeurIPS 2018. This data driven competition asked participants to develop computer programs capable of solving supervised learnin g problems where the i.i.d. assumption did not hold. Large data sets were arranged in a lifelong learning and evaluation scenario and CodaLab was used as the challenge platform. The challenge attracted more than 300 participants in its two month duration. This chapter describes the design of the challenge and summarizes its main results.

التعلم الآلي التعلم الالي

When Ramanujan meets time-frequency analysis in complicated time series analysis

397 - Ziyu Chen , Hau-Tieng Wu 2020

To handle time series with complicated oscillatory structure, we propose a novel time-frequency (TF) analysis tool that fuses the short time Fourier transform (STFT) and periodic transform (PT). Since many time series oscillate with time-varying freq uency, amplitude and non-sinusoidal oscillatory pattern, a direct application of PT or STFT might not be suitable. However, we show that by combining them in a proper way, we obtain a powerful TF analysis tool. We first combine the Ramanujan sums and $l_1$ penalization to implement the PT. We call the algorithm Ramanujan PT (RPT). The RPT is of its own interest for other applications, like analyzing short signal composed of components with integer periods, but that is not the focus of this paper. Second, the RPT is applied to modify the STFT and generate a novel TF representation of the complicated time series that faithfully reflect the instantaneous frequency information of each oscillatory components. We coin the proposed TF analysis the Ramanujan de-shape (RDS) and vectorized RDS (vRDS). In addition to showing some preliminary analysis results on complicated biomedical signals, we provide theoretical analysis about RPT. Specifically, we show that the RPT is robust to three commonly encountered noises, including envelop fluctuation, jitter and additive noise.

معالجة الإشارات تحليل البيانات والإحصاءات والاحتمال المنهجية

Improving Neural Networks for Time Series Forecasting using Data Augmentation and AutoML

65 - Indrajeet Y. Javeri , Mohammadhossein Toutiaee , Ismailcem B. Arpinar 2021

Statistical methods such as the Box-Jenkins method for time-series forecasting have been prominent since their development in 1970. Many researchers rely on such models as they can be efficiently estimated and also provide interpretability. However, advances in machine learning research indicate that neural networks can be powerful data modeling techniques, as they can give higher accuracy for a plethora of learning problems and datasets. In the past, they have been tried on time-series forecasting as well, but their overall results have not been significantly better than the statistical models especially for intermediate length times series data. Their modeling capacities are limited in cases where enough data may not be available to estimate the large number of parameters that these non-linear models require. This paper presents an easy to implement data augmentation method to significantly improve the performance of such networks. Our method, Augmented-Neural-Network, which involves using forecasts from statistical models, can help unlock the power of neural networks on intermediate length time-series and produces competitive results. It shows that data augmentation, when paired with Automated Machine Learning techniques such as Neural Architecture Search, can help to find the best neural architecture for a given time-series. Using the combination of these, demonstrates significant enhancement in the forecasting accuracy of three neural network-based models for a COVID-19 dataset, with a maximum improvement in forecasting accuracy by 21.41%, 24.29%, and 16.42%, respectively, over the neural networks that do not use augmented data.

التعلم الآلي تطبيقات الإحصاء المنهجية

Feature-based time-series analysis

78 - Ben D. Fulcher 2017

This work presents an introduction to feature-based time-series analysis. The time series as a data type is first described, along with an overview of the interdisciplinary time-series analysis literature. I then summarize the range of feature-based representations for time series that have been developed to aid interpretable insights into time-series structure. Particular emphasis is given to emerging research that facilitates wide comparison of feature-based representations that allow us to understand the properties of a time-series dataset that make it suited to a particular feature-based representation or analysis algorithm. The future of time-series analysis is likely to embrace approaches that exploit machine learning methods to partially automate human learning to aid understanding of the complex dynamical patterns in the time series we measure from the world.

التعلم الآلي

Efficient and Consistent Robust Time Series Analysis

148 - Kush Bhatia , Prateek Jain , Parameswaran Kamalaruban 2016

We study the problem of robust time series analysis under the standard auto-regressive (AR) time series model in the presence of arbitrary outliers. We devise an efficient hard thresholding based algorithm which can obtain a consistent estimate of th e optimal AR model despite a large fraction of the time series points being corrupted. Our algorithm alternately estimates the corrupted set of points and the model parameters, and is inspired by recent advances in robust regression and hard-thresholding methods. However, a direct application of existing techniques is hindered by a critical difference in the time-series domain: each point is correlated with all previous points rendering existing tools inapplicable directly. We show how to overcome this hurdle using novel proof techniques. Using our techniques, we are also able to provide the first efficient and provably consistent estimator for the robust regression problem where a standard linear observation model with white additive noise is corrupted arbitrarily. We illustrate our methods on synthetic datasets and show that our methods indeed are able to consistently recover the optimal parameters despite a large fraction of points being corrupted.

التعلم الآلي التعلم الالي