Towards Reducing Biases in Combining Multiple Experts Online

66 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yi Sun

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Yi Sun - Ivan Ramirez - Alfredo Cuesta-Infante

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In many real life situations, including job and loan applications, gatekeepers must make justified and fair real-time decisions about a persons fitness for a particular opportunity. In this paper, we aim to accomplish approximate group fairness in an online stochastic decision-making process, where the fairness metric we consider is equalized odds. Our work follows from the classical learning-from-experts scheme, assuming a finite set of classifiers (human experts, rules, options, etc) that cannot be modified. We run separate instances of the algorithm for each label class as well as sensitive groups, where the probability of choosing each instance is optimized for both fairness and regret. Our theoretical results show that approximately equalized odds can be achieved without sacrificing much regret. We also demonstrate the performance of the algorithm on real data sets commonly used by the fairness community.

قيم البحث

132 - Pablo Samuel Castro , Shijian Li , Daqing Zhang 2019

We consider the problem of learning to behave optimally in a Markov Decision Process when a reward function is not specified, but instead we have access to a set of demonstrators of varying performance. We assume the demonstrators are classified into one of k ranks, and use ideas from ordinal regression to find a reward function that maximizes the margin between the different ranks. This approach is based on the idea that agents should not only learn how to behave from experts, but also how not to behave from non-experts. We show there are MDPs where important differences in the reward function would be hidden from existing algorithms by the behaviour of the expert. Our method is particularly useful for problems where we have access to a large set of agent behaviours with varying degrees of expertise (such as through GPS or cellphones). We highlight the differences between our approach and existing methods using a simple grid domain and demonstrate its efficacy on determining passenger-finding strategies for taxi drivers, using a large dataset of GPS trajectories.

التعلم الآلي التعلم الالي

Mixture of Robust Experts (MoRE):A Robust Denoising Method towards multiple perturbations

123 - Kaidi Xu , Chenan Wang , Hao Cheng 2021

To tackle the susceptibility of deep neural networks to examples, the adversarial training has been proposed which provides a notion of robust through an inner maximization problem presenting the first-order embedded within the outer minimization of the training loss. To generalize the adversarial robustness over different perturbation types, the adversarial training method has been augmented with the improved inner maximization presenting a union of multiple perturbations e.g., various $ell_p$ norm-bounded perturbations.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

100 - Zakaria Mhammedi , Wouter M. Koolen , Tim van Erven 2019

We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up in the de sign of such adaptive algorithms is to calibrate a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc. A recent technique promises to overcome this difficulty by maintaining multiple learning rates in parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization and the Squint algorithm for prediction with expert advice. However, in both cases the user still has to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set too small, the methods may fail completely; but if it is taken too large, performance deteriorates significantly. In the present work we remove this Lipschitz hyperparameter by designing n

التعلم الآلي التعلم الالي

MetaGrad: Adaptation using Multiple Learning Rates in Online Learning

282 - Tim van Erven , Wouter M. Koolen , Dirk van der Hoeven 2021

We provide a new adaptive method for online convex optimization, MetaGrad, that is robust to general convex losses but achieves faster rates for a broad class of special functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. We prove this by drawing a connection to the Bernstein condition, which is known to imply fast rates in offline statistical learning. MetaGrad further adapts automatically to the size of the gradients. Its main feature is that it simultaneously considers multiple learning rates, which are weighted directly proportional to their empirical performance on the data using a new meta-algorithm. We provide thr

التعلم الآلي التعلم الالي

Combining Offline Causal Inference and Online Bandit Learning for Data Driven Decision

297 - Li Ye , Yishi Lin , Hong Xie 2020

A fundamental question for companies with large amount of logged data is: How to use such logged data together with incoming streaming data to make good decisions? Many companies currently make decisions via online A/B tests, but wrong decisions duri ng testing hurt users experiences and cause irreversible damage. A typical alternative is offline causal inference, which analyzes logged data alone to make decisions. However, these decisions are not adaptive to the new incoming data, and so a wrong decision will continuously hurt users experiences. To overcome the aforementioned limitations, we propose a framework to unify offline causal inference algorithms (e.g., weighting, matching) and online learning algorithms (e.g., UCB, LinUCB). We propose novel algorithms and derive bounds on the decision accuracy via the notion of regret. We derive the first upper regret bound for forest-based online bandit algorithms. Experiments on two real datasets show that our algorithms outperform other algorithms that use only logged data or online feedbacks, or algorithms that do not use the data properly.

التعلم الآلي التعلم الالي