Tighter Bound Estimation of Sensitivity Analysis for Incremental and Decremental Data Modification

130 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Kaichen Zhou

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Kaichen Zhou - Shiji Song - Gao Huang

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In large-scale classification problems, the data set always be faced with frequent updates when a part of the data is added to or removed from the original data set. In this case, conventional incremental learning, which updates an existing classifier by explicitly modeling the data modification, is more efficient than retraining a new classifier from scratch. However, sometimes, we are more interested in determining whether we should update the classifier or performing some sensitivity analysis tasks. To deal with these such tasks, we propose an algorithm to make rational inferences about the updated linear classifier without exactly updating the classifier. Specifically, the proposed algorithm can be used to estimate the upper and lower bounds of the updated classifiers coefficient matrix with a low computational complexity related to the size of the updated dataset. Both theoretical analysis and experiment results show that the proposed approach is superior to existing methods in terms of tightness of coefficients bounds and computational complexity.

قيم البحث

85 - Giovanni Cherubin , Konstantinos Chatzikokolakis , Martin Jaggi 2021

Conformal Predictors (CP) are wrappers around ML methods, providing error guarantees under weak assumptions on the data distribution. They are suitable for a wide range of problems, from classification and regression to anomaly detection. Unfortunate ly, their high computational complexity limits their applicability to large datasets. In this work, we show that it is possible to speed up a CP classifier considerably, by studying it in conjunction with the underlying ML method, and by exploiting incremental&decremental learning. For methods such as k-NN, KDE, and kernel LS-SVM, our approach reduces the running time by one order of magnitude, whilst producing exact solutions. With similar ideas, we also achieve a linear speed up for the harder case of bootstrapping. Finally, we extend these techniques to improve upon an optimization of k-NN CP for regression. We evaluate our findings empirically, and discuss when methods are suitable for CP optimization.

التعلم الآلي التحسين والتحكم

Incremental Fast Subclass Discriminant Analysis

138 - Kateryna Chumachenko , Jenni Raitoharju , Moncef Gabbouj 2020

This paper proposes an incremental solution to Fast Subclass Discriminant Analysis (fastSDA). We present an exact and an approximate linear solution, along with an approximate kernelized variant. Extensive experiments on eight image datasets with dif ferent incremental batch sizes show the superiority of the proposed approach in terms of training time and accuracy being equal or close to fastSDA solution and outperforming other methods.

التعلم الآلي التعلم الالي

Incremental and Decremental Secret Key Agreement

370 - Chung Chan , Ali Al-Bashabsheh , Qiaoqiao Zhou 2016

We study the rate of change of the multivariate mutual information among a set of random variables when some common randomness is added to or removed from a subset. This is formulated more precisely as two new multiterminal secret key agreement probl ems which ask how one can increase the secrecy capacity efficiently by adding common randomness to a small subset of users, and how one can simplify the source model by removing redundant common randomness that does not contribute to the secrecy capacity. The combinatorial structure has been clarified along with some meaningful open problems.

نظرية المعلومات نظرية المعلومات

Evaluating and Characterizing Incremental Learning from Non-Stationary Data

63 - Alejandro Cervantes , Christian Gagne , Pedro Isasi 2018

Incremental learning from non-stationary data poses special challenges to the field of machine learning. Although new algorithms have been developed for this, assessment of results and comparison of behaviors are still open problems, mainly because e valuation metrics, adapted from more traditional tasks, can be ineffective in this context. Overall, there is a lack of common testing practices. This paper thus presents a testbed for incremental non-stationary learning algorithms, based on specially designed synthetic datasets. Also, test results are reported for some well-known algorithms to show that the proposed methodology is effective at characterizing their strengths and weaknesses. It is expected that this methodology will provide a common basis for evaluating future contributions in the field.

التعلم الآلي التعلم الالي

An Incremental Clustering Method for Anomaly Detection in Flight Data

71 - Weizun Zhao 2020

Safety is a top priority for civil aviation. Data mining in digital Flight Data Recorder (FDR) or Quick Access Recorder (QAR) data, commonly referred as black box data on aircraft, has gained interest from researchers, airlines, and aviation regulati on agencies for safety management. New anomaly detection methods based on supervised or unsupervised learning have been developed to monitor pilot operations and detect any risks from onboard digital flight data recorder data. However, all existing anomaly detection methods are offline learning - the models are trained once using historical data and used for all future predictions. In practice, new QAR data are generated by every flight and collected by airlines whenever a datalink is available. Offline methods cannot respond to new data in time. Though these offline models can be updated by being re-trained after adding new data to the original training set, it is time-consuming and computational costly to train a new model every time new data come in. To address this problem, we propose a novel incremental anomaly detection method to identify common patterns and detect outliers in flight operations from FDR data. The proposed method is based on Gaussian Mixture Model (GMM). An initial GMM cluster model is trained on historical offline data. Then, it continuously adapts to new incoming data points via an expectation-maximization (EM) algorithm. To track changes in flight operation patterns, only model parameters need to be saved, not the raw flight data. The proposed method was tested on two sets of simulation data. Comparable results were found from the proposed online method and a classic offline model. A real-world application of the proposed method is demonstrated using FDR data from daily operations of an airline. Results are presented and future challenges of using online learning scheme for flight data analytics are discussed.

التعلم الآلي التعلم الالي