Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Under-bagging Nearest Neighbors for Imbalanced Classification

126 0 0.0 ( 0 )

Download Cite

Added by Hanyuan Hang

Publication date 2021

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Hanyuan Hang - Yuchao Cai - Hanfang Yang

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we propose an ensemble learning algorithm called textit{under-bagging $k$-nearest neighbors} (textit{under-bagging $k$-NN}) for imbalanced classification problems. On the theoretical side, by developing a new learning theory analysis, we show that with properly chosen parameters, i.e., the number of nearest neighbors $k$, the expected sub-sample size $s$, and the bagging rounds $B$, optimal convergence rates for under-bagging $k$-NN can be achieved under mild assumptions w.r.t.~the arithmetic mean (AM) of recalls. Moreover, we show that with a relatively small $B$, the expected sub-sample size $s$ can be much smaller than the number of training data $n$ at each bagging round, and the number of nearest neighbors $k$ can be reduced simultaneously, especially when the data are highly imbalanced, which leads to substantially lower time complexity and roughly the same space complexity. On the practical side, we conduct numerical experiments to verify the theoretical results on the benefits of the under-bagging technique by the promising AM performance and efficiency of our proposed algorithm.

rate research

Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

160 - Yue Xing , Qifan Song , Guang Cheng 2020

We consider a data corruption scenario in the classical $k$ Nearest Neighbors ($k$-NN) algorithm, that is, the testing data are randomly perturbed. Under such a scenario, the impact of corruption level on the asymptotic regret is carefully characterized. In particular, our theoretical analysis reveals a phase transition phenomenon that, when the corruption level $omega$ is below a critical order (i.e., small-$omega$ regime), the asymptotic regret remains the same; when it is beyond that order (i.e., large-$omega$ regime), the asymptotic regret deteriorates polynomially. Surprisingly, we obtain a negative result that the classical noise-injection approach will not help improve the testing performance in the beginning stage of the large-$omega$ regime, even in the level of the multiplicative constant of asymptotic regret. As a technical by-product, we prove that under different model assumptions, the pre-processed 1-NN proposed in cite{xue2017achieving} will at most achieve a sub-optimal rate when the data dimension $d>4$ even if $k$ is chosen optimally in the pre-processing step.

Machine Learning Machine Learning

Classification Under Human Assistance

99 - Abir De , Nastaran Okati , Ali Zarezade 2020

Most supervised learning models are trained for full automation. However, their predictions are sometimes worse than those by human experts on some specific instances. Motivated by this empirical observation, our goal is to design classifiers that are optimized to operate under different automation levels. More specifically, we focus on convex margin-based classifiers and first show that the problem is NP-hard. Then, we further show that, for support vector machines, the corresponding objective function can be expressed as the difference of two functions f = g - c, where g is monotone, non-negative and {gamma}-weakly submodular, and c is non-negative and modular. This representation allows a recently introduced deterministic greedy algorithm, as well as a more efficient randomized variant of the algorithm, to enjoy approximation guarantees at solving the problem. Experiments on synthetic and real-world data from several applications in medical diagnosis illustrate our theoretical findings and demonstrate that, under human assistance, supervised learning models trained to operate under different automation levels can outperform those trained for full automation as well as humans operating alone.

Machine Learning Machine Learning

$(1 + varepsilon)$-class Classification: an Anomaly Detection Method for Highly Imbalanced or Incomplete Data Sets

75 - Maxim Borisyak , Artem Ryzhikov , Andrey Ustyuzhanin 2019

Anomaly detection is not an easy problem since distribution of anomalous samples is unknown a priori. We explore a novel method that gives a trade-off possibility between one-class and two-class approaches, and leads to a better performance on anomaly detection problems with small or non-representative anomalous samples. The method is evaluated using several data sets and compared to a set of conventional one-class and two-class approaches.

Machine Learning Machine Learning

Distribution-free uncertainty quantification for classification under label shift

123 - Aleksandr Podkopaev , Aaditya Ramdas 2021

Trustworthy deployment of ML models requires a proper measure of uncertainty, especially in safety-critical applications. We focus on uncertainty quantification (UQ) for classification problems via two avenues -- prediction sets using conformal prediction and calibration of probabilistic predictors by post-hoc binning -- since these possess distribution-free guarantees for i.i.d. data. Two common ways of generalizing beyond the i.i.d. setting include handling covariate and label shift. Within the context of distribution-free UQ, the former has already received attention, but not the latter. It is known that label shift hurts prediction, and we first argue that it also hurts UQ, by showing degradation in coverage and calibration. Piggybacking on recent progress in addressing label shift (for better prediction), we examine the right way to achieve UQ by reweighting the aforementioned conformal and calibration procedures whenever some unlabeled data from the target distribution is available. We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.

Machine Learning Machine Learning

Reducing Sampling Ratios Improves Bagging in Sparse Regression

85 - Luoluo Liu , Sang Peter Chin , Trac D. Tran 2018

Bagging, a powerful ensemble method from machine learning, improves the performance of unstable predictors. Although the power of Bagging has been shown mostly in classification problems, we demonstrate the success of employing Bagging in sparse regression over the baseline method (L1 minimization). The framework employs the generalized version of the original Bagging with various bootstrap ratios. The performance limits associated with different choices of bootstrap sampling ratio L/m and number of estimates K is analyzed theoretically. Simulation shows that the proposed method yields state-of-the-art recovery performance, outperforming L1 minimization and Bolasso in the challenging case of low levels of measurements. A lower L/m ratio (60% - 90%) leads to better performance, especially with a small number of measurements. With the reduced sampling rate, SNR improves over the original Bagging by up to 24%. With a properly chosen sampling ratio, a reasonably small number of estimates K = 30 gives satisfying result, even though increasing K is discovered to always improve or at least maintain the performance.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Under-bagging Nearest Neighbors for Imbalanced Classification

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions