بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Grouped variable importance with random forests and application to multiple functional data analysis

342 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Baptiste Gregorutti

تاريخ النشر 2014

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Baptiste Gregorutti - Bertrand Michel - Philippe Saint-Pierre

المنهجية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The selection of grouped variables using the random forest algorithm is considered. First a new importance measure adapted for groups of variables is proposed. Theoretical insights into this criterion are given for additive regression models. Second, an original method for selecting functional variables based on the grouped variable importance measure is developed. Using a wavelet basis, it is proposed to regroup all of the wavelet coefficients for a given functional variable and use a wrapper selection algorithm with these groups. Various other groupings which take advantage of the frequency and time localization of the wavelet basis are proposed. An extensive simulation study is performed to illustrate the use of the grouped importance measure in this context. The method is applied to a real life problem coming from aviation safety.

قيم البحث

86 - Joshua Daniel Loyal , Ruoqing Zhu , Yifan Cui 2021

Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such assessments at a local level, motivated by applications in personalized medicine, policy-making, and bioinformatics. We propose a new nonparametric estimator that pairs the flexible random forest kernel with local sufficient dimension reduction to adapt to a regression functions local structure. This allows us to estimate a meaningful directional local variable importance measure at each prediction point. We develop a computationally efficient fitting procedure and provide sufficient conditions for the recovery of the splitting directions. We demonstrate significant accuracy gains of our proposed estimator over competing methods on simulated and real regression problems. Finally, we apply the proposed method to seasonal particulate matter concentration data collected in Beijing, China, which yields meaningful local importance measures. The methods presented here are available in the drforest Python package.

المنهجية

Depth-based Outlier Detection for Grouped Smart Meters: a Functional Data Analysis Toolbox

108 - A. Elias , J. M. Morales , S. Pineda 2021

Smart metering infrastructures collect data almost continuously in the form of fine-grained long time series. These massive time series often have common daily patterns that are repeated between similar days or seasons and shared between grouped mete rs. Within this context, we propose a method to highlight individuals with abnormal daily dependency patterns, which we term evolution outliers. To this end, we approach the problem from the standpoint of Functional Data Analysis (FDA), by treating each daily record as a function or curve. We then focus on the morphological aspects of the observed curves, such as daily magnitude, daily shape, derivatives, and inter-day evolution. The proposed method for evolution outliers relies on the concept of functional depth, which has been a cornerstone in the literature of FDA to build shape and magnitude outlier detection methods. In conjunction with our evolution outlier proposal, these methods provide an outlier detection toolbox for smart meter data that covers a wide palette of functional outliers classes. We illustrate the outlier identification ability of this toolbox using actual smart metering data corresponding to photovoltaic energy generation and circuit voltage records.

المنهجية تطبيقات الإحصاء

Variable importance in binary regression trees and forests

412 - Hemant Ishwaran 2007

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally extends fro m single trees to ensembles of trees and applies to methods like random forests. This is useful because while importance values from random forests are used to screen variables, for example they are used to filter high throughput genomic data in Bioinformatics, very little theory exists about their properties.

التعلم الالي

Multivariate functional responses low rank regression with an application to brain imaging data

115 - Xiucai Ding , Dengdeng Yu , Zhengwu Zhang 2020

We propose a multivariate functional responses low rank regression model with possible high dimensional functional responses and scalar covariates. By expanding the slope functions on a set of sieve basis, we reconstruct the basis coefficients as a m atrix. To estimate these coefficients, we propose an efficient procedure using nuclear norm regularization. We also derive error bounds for our estimates and evaluate our method using simulations. We further apply our method to the Human Connectome Project neuroimaging data to predict cortical surface motor task-evoked functional magnetic resonance imaging signals using various clinical covariates to illustrate the usefulness of our results.

المنهجية تطبيقات الإحصاء

Functional and Parametric Estimation in a Semi- and Nonparametric Model with Application to Mass-Spectrometry Data

483 - Weiping Ma , Yang Feng , Kani Chen 2013

Motivated by modeling and analysis of mass-spectrometry data, a semi- and nonparametric model is proposed that consists of a linear parametric component for individual location and scale and a nonparametric regression function for the common shape. A multi-step approach is developed that simultaneously estimates the parametric components and the nonparametric function. Under certain regularity conditions, it is shown that the resulting estimators is consistent and asymptotic normal for the parametric part and achieve the optimal rate of convergence for the nonparametric part when the bandwidth is suitably chosen. Simulation results are presented to demonstrate the effectiveness and finite-sample performance of the method. The method is also applied to a SELDI-TOF mass spectrometry data set from a study of liver cancer patients.

المنهجية نظرية الإحصاء تطبيقات الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد الوطني لإدارة الأعمال

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Grouped variable importance with random forests and application to multiple functional data analysis

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً