A Dynamic Boosted Ensemble Learning Method Based on Random Forest

135 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xingzhang Ren

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Xingzhang Ren - Chen Long - Leilei Zhang

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a dynamic boosted ensemble learning method based on random forest (DBRF), a novel ensemble algorithm that incorporates the notion of hard example mining into Random Forest (RF) and thus combines the high accuracy of Boosting algorithm with the strong generalization of Bagging algorithm. Specifically, we propose to measure the quality of each leaf node of every decision tree in the random forest to determine hard examples. By iteratively training and then removing easy examples from training data, we evolve the random forest to focus on hard examples dynamically so as to learn decision boundaries better. Data can be cascaded through these random forests learned in each iteration in sequence to generate predictions, thus making RF deep. We also propose to use evolution mechanism and smart iteration mechanism to improve the performance of the model. DBRF outperforms RF on three UCI datasets and achieved state-of-the-art results compared to other deep models. Moreover, we show that DBRF is also a new way of sampling and can be very useful when learning from imbalanced data.

قيم البحث

اقرأ أيضاً

WildWood: a new Random Forest algorithm

218 - Stephane Gaiffas , Ibrahim Merad , Yiyang Yu 2021

We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type. While standard RF algorithms use bootstrap out-of-bag samples to compute out-of-bag scores, WW uses these samples to produce improved predictions given by an aggregation of the predictions of all possible subtrees of each fully grown tree in the forest. This is achieved by aggregation with exponential weights computed over out-of-bag samples, that are computed exactly and very efficiently thanks to an algorithm called context tree weighting. This improvement, combined with a histogram strategy to accelerate split finding, makes WW fast and competitive compared with other well-established ensemble methods, such as standard RF and extreme gradient boosting algorithms.

التعلم الآلي التعلم الالي

A new multilayer optical film optimal method based on deep q-learning

68 - Anqing Jiang , Osamu Yoshie , LiangYao Chen 2018

Multi-layer optical film has been found to afford important applications in optical communication, optical absorbers, optical filters, etc. Different algorithms of multi-layer optical film design has been developed, as simplex method, colony algorith m, genetic algorithm. These algorithms rapidly promote the design and manufacture of multi-layer films. However, traditional numerical algorithms of converge to local optimum. This means that the algorithms can not give a global optimal solution to the material researchers. In recent years, due to the rapid development to far artificial intelligence, to optimize optical film structure using AI algorithm has become possible. In this paper, we will introduce a new optical film design algorithm based on the deep Q learning. This model can converge the global optimum of the optical thin film structure, this will greatly improve the design efficiency of multi-layer films.

التعلم الآلي التعلم الالي

Dense Adaptive Cascade Forest: A Self Adaptive Deep Ensemble for Classification Problems

110 - Haiyang Wang , Yong Tang , Ziyang Jia 2018

Recent researches have shown that deep forest ensemble achieves a considerable increase in classification accuracy compared with the general ensemble learning methods, especially when the training set is small. In this paper, we take advantage of dee p forest ensemble and introduce the Dense Adaptive Cascade Forest (daForest). Our model has a better performance than the original Cascade Forest with three major features: first, we apply SAMME.R boosting algorithm to improve the performance of the model. It guarantees the improvement as the number of layers increases. Second, our model connects each layer to the subsequent ones in a feed-forward fashion, which enhances the capability of the model to resist performance degeneration. Third, we add a hyper-parameters optimization layer before the first classification layer, making our model spend less time to set up and find the optimal hyper-parameters. Experimental results show that daForest performs significantly well, and in some cases, even outperforms neural networks and achieves state-of-the-art results.

التعلم الآلي التعلم الالي

A novel transfer learning method based on common space mapping and weighted domain matching

133 - Ru-Ze Liang , Wei Xie , Weizhi Li 2016

In this paper, we propose a novel learning framework for the problem of domain transfer learning. We map the data of two domains to one single common space, and learn a classifier in this common space. Then we adapt the common classifier to the two d omains by adding two adaptive functions to it respectively. In the common space, the target domain data points are weighted and matched to the target domain in term of distributions. The weighting terms of source domain data points and the target domain classification responses are also regularized by the local reconstruction coefficients. The novel transfer learning framework is evaluated over some benchmark cross-domain data sets, and it outperforms the existing state-of-the-art transfer learning methods.

التعلم الآلي التعلم الالي

Random Warping Series: A Random Features Method for Time-Series Embedding

121 - Lingfei Wu , Ian En-Hsu Yen , Jinfeng Yi 2018

Time series data analytics has been a problem of substantial interests for decades, and Dynamic Time Warping (DTW) has been the most widely adopted technique to measure dissimilarity between time series. A number of global-alignment kernels have sinc e been proposed in the spirit of DTW to extend its use to kernel-based estimation method such as support vector machine. However, those kernels suffer from diagonal dominance of the Gram matrix and a quadratic complexity w.r.t. the sample size. In this work, we study a family of alignment-aware positive definite (p.d.) kernels, with its feature embedding given by a distribution of emph{Random Warping Series (RWS)}. The proposed kernel does not suffer from the issue of diagonal dominance while naturally enjoys a emph{Random Features} (RF) approximation, which reduces the computational complexity of existing DTW-based techniques from quadratic to linear in terms of both the number and the length of time-series. We also study the convergence of the RF approximation for the domain of time series of unbounded length. Our extensive experiments on 16 benchmark datasets demonstrate that RWS outperforms or matches state-of-the-art classification and clustering methods in both accuracy and computational time. Our code and data is available at { url{https://github.com/IBM/RandomWarpingSeries}}.

التعلم الآلي التعلم الالي