Uncertain Trees: Dealing with Uncertain Inputs in Regression Trees

72 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Myriam Tami

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Myriam Tami - Marianne Clausel - Emilie Devijver

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Tree-based ensemble methods, as Random Forests and Gradient Boosted Trees, have been successfully used for regression in many applications and research studies. Furthermore, these methods have been extended in order to deal with uncertainty in the output variable, using for example a quantile loss in Random Forests (Meinshausen, 2006). To the best of our knowledge, no extension has been provided yet for dealing with uncertainties in the input variables, even though such uncertainties are common in practical situations. We propose here such an extension by showing how standard regression trees optimizing a quadratic loss can be adapted and learned while taking into account the uncertainties in the inputs. By doing so, one no longer assumes that an observation lies into a single region of the regression tree, but rather that it belongs to each region with a certain probability. Experiments conducted on several data sets illustrate the good behavior of the proposed extension.

قيم البحث

78 - Xujiang Zhao , Feng Chen , Jin-Hee Cho 2019

Subjective Logic (SL) is one of well-known belief models that can explicitly deal with uncertain opinions and infer unknown opinions based on a rich set of operators of fusing multiple opinions. Due to high simplicity and applicability, SL has been s ubstantially applied in a variety of decision making in the area of cybersecurity, opinion models, trust models, and/or social network analysis. However, SL and its variants have exposed limitations in predicting uncertain opinions in real-world dynamic network data mainly in three-fold: (1) a lack of scalability to deal with a large-scale network; (2) limited capability to handle heterogeneous topological and temporal dependencies among node-level opinions; and (3) a high sensitivity with conflicting evidence that may generate counterintuitive opinions derived from the evidence. In this work, we proposed a novel deep learning (DL)-based dynamic opinion inference model while node-level opinions are still formalized based on SL meaning that an opinion has a dimension of uncertainty in addition to belief and disbelief in a binomial opinion (i.e., agree or disagree). The proposed DL-based dynamic opinion inference model overcomes the above three limitations by integrating the following techniques: (1) state-of-the-art DL techniques, such as the Graph Convolutional Network (GCN) and the Gated Recurrent Units (GRU) for modeling the topological and temporal heterogeneous dependency information of a given dynamic network; (2) modeling conflicting opinions based on robust statistics; and (3) a highly scalable inference algorithm to predict dynamic, uncertain opinions in a linear computation time. We validated the outperformance of our proposed DL-based algorithm (i.e., GCN-GRU-opinion model) via extensive comparative performance analysis based on four real-world datasets.

التعلم الآلي التعلم الالي

Temporal Logic Trees for Model Checking and Control Synthesis of Uncertain Discrete-time Systems

104 - Yulong Gao , Alessandro Abate , Frank J. Jiang 2020

We propose algorithms for performing model checking and control synthesis for discrete-time uncertain systems under linear temporal logic (LTL) specifications. We construct temporal logic trees (TLT) from LTL formulae via reachability analysis. In co ntrast to automaton-based methods, the construction of the TLT is abstraction-free for infinite systems, that is, we do not construct discrete abstractions of the infinite systems. Moreover, for a given transition system and an LTL formula, we prove that there exist both a universal TLT and an existential TLT via minimal and maximal reachability analysis, respectively. We show that the universal TLT is an underapproximation for the LTL formula and the existential TLT is an overapproximation. We provide sufficient conditions and necessary conditions to verify whether a transition system satisfies an LTL formula by using the TLT approximations. As a major contribution of this work, for a controlled transition system and an LTL formula, we prove that a controlled TLT can be constructed from the LTL formula via control-dependent reachability analysis. Based on the controlled TLT, we design an online control synthesis algorithm, under which a set of feasible control inputs can be generated at each time step. We also prove that this algorithm is recursively feasible. We illustrate the proposed methods for both finite and infinite systems and highlight the generality and online scalability with two simulated examples.

أنظمة وتحكم أنظمة وتحكم

Parameter-robust Stochastic Galerkin mixed approximation for linear poroelasticity with uncertain inputs

94 - Arbaz Khan , Catherine E. Powell 2020

Linear poroelasticity models have a number of important applications in biology and geophysics. In particular, Biots consolidation model is a well-known model that describes the coupled interaction between the linear response of a porous elastic medi um and a diffusive fluid flow within it, assuming small deformations. Although deterministic linear poroelasticity models and finite element methods for solving them numerically have been well studied, there is little work to date on robust algorithms for solving poroelasticity models with uncertain inputs and for performing uncertainty quantification (UQ). The Biot model has a number of important physical parameters and inputs whose precise values are often uncertain in real world scenarios. In this work, we introduce and analyse the well-posedness of a new five-field model with uncertain and spatially varying Youngs modulus and hydraulic conductivity field. By working with a properly weighted norm, we establish that the weak solution is stable with respect to variations in key physical parameters, including the Poisson ratio. We then introduce a novel locking-free stochastic Galerkin mixed finite element method that is robust in the incompressible limit. Armed with the `right norm, we construct a parameter-robust preconditioner for the associated discrete systems. Our new method facilitates forward UQ, allowing efficient calculation of statistical quantities of interest and is provably robust with respect to variations in the Poisson ratio, the Biot--Willis constant and the storage coefficient, as well as the discretization parameters.

التحليل العددي التحليل العددي

Multi-Layered Gradient Boosting Decision Trees

89 - Ji Feng , Yang Yu , Zhi-Hua Zhou 2018

Multi-layered representation is believed to be the key ingredient of deep neural networks especially in cognitive tasks like computer vision. While non-differentiable models such as gradient boosting decision trees (GBDTs) are the dominant methods fo r modeling discrete or tabular data, they are hard to incorporate with such representation learning ability. In this work, we propose the multi-layered GBDT forest (mGBDTs), with an explicit emphasis on exploring the ability to learn hierarchical representations by stacking several layers of regression GBDTs as its building block. The model can be jointly trained by a variant of target propagation across layers, without the need to derive back-propagation nor differentiability. Experiments and visualizations confirmed the effectiveness of the model in terms of performance and representation learning ability.

التعلم الآلي التعلم الالي

Evaluating Fairness of Machine Learning Models Under Uncertain and Incomplete Information

144 - Pranjal Awasthi , Alex Beutel , Matthaeus Kleindessner 2021

Training and evaluation of fair classifiers is a challenging problem. This is partly due to the fact that most fairness metrics of interest depend on both the sensitive attribute information and label information of the data points. In many scenarios it is not possible to collect large datasets with such information. An alternate approach that is commonly used is to separately train an attribute classifier on data with sensitive attribute information, and then use it later in the ML pipeline to evaluate the bias of a given classifier. While such decoupling helps alleviate the problem of demographic scarcity, it raises several natural questions such as: how should the attribute classifier be trained?, and how should one use a given attribute classifier for accurate bias estimation? In this work we study this question from both theoretical and empirical perspectives. We first experimentally demonstrate that the test accuracy of the attribute classifier is not always correlated with its effectiveness in bias estimation for a downstream model. In order to further investigate this phenomenon, we analyze an idealized theoretical model and characterize the structure of the optimal classifier. Our analysis has surprising and counter-intuitive implications where in certain regimes one might want to distribute the error of the attribute classifier as unevenly as possible among the different subgroups. Based on our analysis we develop heuristics for both training and using attribute classifiers for bias estimation in the data scarce regime. We empirically demonstrate the effectiveness of our approach on real and simulated data.

التعلم الآلي التعلم الالي