Do you want to publish a course? Click here

Ensemble Quantile Classifier

145   0   0.0 ( 0 )
 Added by Yuanhao Lai
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Both the median-based classifier and the quantile-based classifier are useful for discriminating high-dimensional data with heavy-tailed or skewed inputs. But these methods are restricted as they assign equal weight to each variable in an unregularized way. The ensemble quantile classifier is a more flexible regularized classifier that provides better performance with high-dimensional data, asymmetric data or when there are many irrelevant extraneous inputs. The improved performance is demonstrated by a simulation study as well as an application to text categorization. It is proven that the estimated parameters of the ensemble quantile classifier consistently estimate the minimal population loss under suitable general model assumptions. It is also shown that the ensemble quantile classifier is Bayes optimal under suitable assumptions with asymmetric Laplace distribution inputs.



rate research

Read More

Random forests are powerful non-parametric regression method but are severely limited in their usage in the presence of randomly censored observations, and naively applied can exhibit poor predictive performance due to the incurred biases. Based on a local adaptive representation of random forests, we develop its regression adjustment for randomly censored regression quantile models. Regression adjustment is based on a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring. The proposed procedure named {it censored quantile regression forest}, allows us to estimate quantiles of time-to-event without any parametric modeling assumption. We establish its consistency under mild model specifications. Numerical studies showcase a clear advantage of the proposed procedure.
We consider a problem of multiclass classification, where the training sample $S_n = {(X_i, Y_i)}_{i=1}^n$ is generated from the model $mathbb P(Y = m | X = x) = eta_m(x)$, $1 leq m leq M$, and $eta_1(x), dots, eta_M(x)$ are unknown $alpha$-Holder continuous functions.Given a test point $X$, our goal is to predict its label. A widely used $mathsf k$-nearest-neighbors classifier constructs estimates of $eta_1(X), dots, eta_M(X)$ and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter $mathsf k$, which may become tricky in some situations. In our solution, we fix several integers $n_1, dots, n_K$, compute corresponding $n_k$-nearest-neighbor estimates for each $m$ and each $n_k$ and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter $alpha$ and to the margin and establish rates of convergence under mild assumptions.
121 - Ning Ning 2020
In this paper, we propose the multivariate quantile Bayesian structural time series (MQBSTS) model for the joint quantile time series forecast, which is the first such model for correlated multivariate time series to the authors best knowledge. The MQBSTS model also enables quantile based feature selection in its regression component where each time series has its own pool of contemporaneous external time series predictors, which is the first time that a fully data-driven quantile feature selection technique applicable to time series data to the authors best knowledge. Different from most machine learning algorithms, the MQBSTS model has very few hyper-parameters to tune, requires small datasets to train, converges fast, and is executable on ordinary personal computers. Extensive examinations on simulated data and empirical data confirmed that the MQBSTS model has superior performance in feature selection, parameter estimation, and forecast.
Given a binary prediction problem, which performance metric should the classifier optimize? We address this question by formalizing the problem of Metric Elicitation. The goal of metric elicitation is to discover the performance metric of a practitioner, which reflects her innate rewards (costs) for correct (incorrect) classification. In particular, we focus on eliciting binary classification performance metrics from pairwise feedback, where a practitioner is queried to provide relative preference between two classifiers. By exploiting key geometric properties of the space of confusion matrices, we obtain provably query efficient algorithms for eliciting linear and linear-fractional performance metrics. We further show that our method is robust to feedback and finite sample noise.
Generally, the risks associated with malicious threats are increasing for the IIoT and its related applications due to dependency on the Internet and the minimal resource availability of IoT devices. Thus, anomaly-based intrusion detection models for IoT networks are vital. Distinct detection methodologies need to be developed for the IIoT network as threat detection is a significant expectation of stakeholders. Machine learning approaches are considered to be evolving techniques that learn with experience, and such approaches have resulted in superior performance in various applications, such as pattern recognition, outlier analysis, and speech recognition. Traditional techniques and tools are not adequate to secure IIoT networks due to the use of various protocols in industrial systems and restricted possibilities of upgradation. In this paper, the objective is to develop a two-phase anomaly detection model to enhance the reliability of an IIoT network. In the first phase, SVM and Naive Bayes are integrated using an ensemble blending technique. K-fold cross-validation is performed while training the data with different training and testing ratios to obtain optimized training and test sets. Ensemble blending uses a random forest technique to predict class labels. An Artificial Neural Network (ANN) classifier that uses the Adam optimizer to achieve better accuracy is also used for prediction. In the second phase, both the ANN and random forest results are fed to the models classification unit, and the highest accuracy value is considered the final result. The proposed model is tested on standard IoT attack datasets, such as WUSTL_IIOT-2018, N_BaIoT, and Bot_IoT. The highest accuracy obtained is 99%. The results also demonstrate that the proposed model outperforms traditional techniques and thus improves the reliability of an IIoT network.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا