Do you want to publish a course? Click here

Sampling Permutations for Shapley Value Estimation

79   0   0.0 ( 0 )
 Added by Joshua N. Cooper
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Game-theoretic attribution techniques based on Shapley values are used extensively to interpret black-box machine learning models, but their exact calculation is generally NP-hard, requiring approximation methods for non-trivial models. As the computation of Shapley values can be expressed as a summation over a set of permutations, a common approach is to sample a subset of these permutations for approximation. Unfortunately, standard Monte Carlo sampling methods can exhibit slow convergence, and more sophisticated quasi Monte Carlo methods are not well defined on the space of permutations. To address this, we investigate new approaches based on two classes of approximation methods and compare them empirically. First, we demonstrate quadrature techniques in a RKHS containing functions of permutations, using the Mallows kernel to obtain explicit convergence rates of $O(1/n)$, improving on $O(1/sqrt{n})$ for plain Monte Carlo. The RKHS perspective also leads to quasi Monte Carlo type error bounds, with a tractable discrepancy measure defined on permutations. Second, we exploit connections between the hypersphere $mathbb{S}^{d-2}$ and permutations to create practical algorithms for generating permutation samples with good properties. Experiments show the above techniques provide significant improvements for Shapley value estimates over existing methods, converging to a smaller RMSE in the same number of model evaluations.



rate research

Read More

We propose a new method for modeling the distribution function of high dimensional extreme value distributions. The Pickands dependence function models the relationship between the covariates in the tails, and we learn this function using a neural network that is designed to satisfy its required properties. Moreover, we present new methods for recovering the spectral representation of extreme distributions and propose a generative model for sampling from extreme copulas. Numerical examples are provided demonstrating the efficacy and promise of our proposed methods.
For feature selection and related problems, we introduce the notion of classification game, a cooperative game, with features as players and hinge loss based characteristic function and relate a features contribution to Shapley value based error apportioning (SVEA) of total training error. Our major contribution is ($star$) to show that for any dataset the threshold 0 on SVEA value identifies feature subset whose joint interactions for label prediction is significant or those features that span a subspace where the data is predominantly lying. In addition, our scheme ($star$) identifies the features on which Bayes classifier doesnt depend but any surrogate loss function based finite sample classifier does; this contributes to the excess $0$-$1$ risk of such a classifier, ($star$) estimates unknown true hinge risk of a feature, and ($star$) relate the stability property of an allocation and negative valued SVEA by designing the analogue of core of classification game. Due to Shapley values computationally expensive nature, we build on a known Monte Carlo based approximation algorithm that computes characteristic function (Linear Programs) only when needed. We address the potential sample bias problem in feature selection by providing interval estimates for SVEA values obtained from multiple sub-samples. We illustrate all the above aspects on various synthetic and real datasets and show that our scheme achieves better results than existing recursive feature elimination technique and ReliefF in most cases. Our theoretically grounded classification game in terms of well defined characteristic function offers interpretability (which we formalize in terms of final task) and explainability of our framework, including identification of important features.
105 - Clement Gauchy , Cyril Feau , 2021
As part of Probabilistic Risk Assessment studies, it is necessary to study the fragility of mechanical and civil engineered structures when subjected to seismic loads. This risk can be measured with fragility curves, which express the probability of failure of the structure conditionally to a seismic intensity measure. The estimation of fragility curves relies on time-consuming numerical simulations, so that careful experimental design is required in order to gain the maximum information on the structures fragility with a limited number of code evaluations. We propose and implement an active learning methodology based on adaptive importance sampling in order to reduce the variance of the training loss. The efficiency of the proposed method in terms of bias, standard deviation and prediction interval coverage are theoretically and numerically characterized.
We consider estimating the marginal likelihood in settings with independent and identically distributed (i.i.d.) data. We propose estimating the predictive distributions in a sequential factorization of the marginal likelihood in such settings by using stochastic gradient Markov Chain Monte Carlo techniques. This approach is far more efficient than traditional marginal likelihood estimation techniques such as nested sampling and annealed importance sampling due to its use of mini-batches to approximate the likelihood. Stability of the estimates is provided by an adaptive annealing schedule. The resulting stochastic gradient annealed importance sampling (SGAIS) technique, which is the key contribution of our paper, enables us to estimate the marginal likelihood of a number of models considerably faster than traditional approaches, with no noticeable loss of accuracy. An important benefit of our approach is that the marginal likelihood is calculated in an online fashion as data becomes available, allowing the estimates to be used for applications such as online weighted model combination.
We study covariance matrix estimation for the case of partially observed random vectors, where different samples contain different subsets of vector coordinates. Each observation is the product of the variable of interest with a $0-1$ Bernoulli random variable. We analyze an unbiased covariance estimator under this model, and derive an error bound that reveals relations between the sub-sampling probabilities and the entries of the covariance matrix. We apply our analysis in an active learning framework, where the expected number of observed variables is small compared to the dimension of the vector of interest, and propose a design of optimal sub-sampling probabilities and an active covariance matrix estimation algorithm.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا