Do you want to publish a course? Click here

Agnostic learning with unknown utilities

100   0   0.0 ( 0 )
 Added by Kush Bhatia
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Traditional learning approaches for classification implicitly assume that each mistake has the same cost. In many real-world problems though, the utility of a decision depends on the underlying context $x$ and decision $y$. However, directly incorporating these utilities into the learning objective is often infeasible since these can be quite complex and difficult for humans to specify. We formally study this as agnostic learning with unknown utilities: given a dataset $S = {x_1, ldots, x_n}$ where each data point $x_i sim mathcal{D}$, the objective of the learner is to output a function $f$ in some class of decision functions $mathcal{F}$ with small excess risk. This risk measures the performance of the output predictor $f$ with respect to the best predictor in the class $mathcal{F}$ on the unknown underlying utility $u^*$. This utility $u^*$ is not assumed to have any specific structure. This raises an interesting question whether learning is even possible in our setup, given that obtaining a generalizable estimate of utility $u^*$ might not be possible from finitely many samples. Surprisingly, we show that estimating the utilities of only the sampled points~$S$ suffices to learn a decision function which generalizes well. We study mechanisms for eliciting information which allow a learner to estimate the utilities $u^*$ on the set $S$. We introduce a family of elicitation mechanisms by generalizing comparisons, called the $k$-comparison oracle, which enables the learner to ask for comparisons across $k$ different inputs $x$ at once. We show that the excess risk in our agnostic learning framework decreases at a rate of $Oleft(frac{1}{k} right)$. This result brings out an interesting accuracy-elicitation trade-off -- as the order $k$ of the oracle increases, the comparative queries become harder to elicit from humans but allow for more accurate learning.



rate research

Read More

210 - Bingcong Li , Tianyi Chen , 2018
This paper deals with bandit online learning problems involving feedback of unknown delay that can emerge in multi-armed bandit (MAB) and bandit convex optimization (BCO) settings. MAB and BCO require only values of the objective function involved that become available through feedback, and are used to estimate the gradient appearing in the corresponding iterative algorithms. Since the challenging case of feedback with emph{unknown} delays prevents one from constructing the sought gradient estimates, existing MAB and BCO algorithms become intractable. For such challenging setups, delayed exploration, exploitation, and exponential (DEXP3) iterations, along with delayed bandit gradient descent (DBGD) iterations are developed for MAB and BCO, respectively. Leveraging a unified analysis framework, it is established that the regret of DEXP3 and DBGD are ${cal O}big( sqrt{Kbar{d}(T+D)} big)$ and ${cal O}big( sqrt{K(T+D)} big)$, respectively, where $bar{d}$ is the maximum delay and $D$ denotes the delay accumulated over $T$ slots. Numerical tests using both synthetic and real data validate the performance of DEXP3 and DBGD.
Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting, all leveraging the invertibility and exact likelihood which are uniquely enabled by the normalizing flow model. We use the generative capabilities of the flow to avoid catastrophic forgetting through generative replay and a novel functional regularization technique. For task identification, we use state-of-the-art anomaly detection techniques based on measuring the typicality of the models statistics. We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.
In this paper, we study a novel Stochastic Network Utility Maximization (NUM) problem where the utilities of agents are unknown. The utility of each agent depends on the amount of resource it receives from a network operator/controller. The operator desires to do a resource allocation that maximizes the expected total utility of the network. We consider threshold type utility functions where each agent gets non-zero utility if the amount of resource it receives is higher than a certain threshold. Otherwise, its utility is zero (hard real-time). We pose this NUM setup with unknown utilities as a regret minimization problem. Our goal is to identify a policy that performs as `good as an oracle policy that knows the utilities of agents. We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents. We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits. We show that the proposed algorithm is optimal when all agents have the same utility. We validate the performance guarantees of our proposed algorithms through numerical experiments.
We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm that achieves $mathcal{tilde{O}}(L|X|sqrt{|A|T})$ regret with high probability, where $L$ is the horizon, $|X|$ is the number of states, $|A|$ is the number of actions, and $T$ is the number of episodes. To the best of our knowledge, our algorithm is the first to ensure $mathcal{tilde{O}}(sqrt{T})$ regret in this challenging setting; in fact it achieves the same regret bound as (Rosenberg & Mansour, 2019a) that considers an easier setting with full-information feedback. Our key technical contributions are two-fold: a tighter confidence set for the transition function, and an optimistic loss estimator that is inversely weighted by an $textit{upper occupancy bound}$.
The collection and sharing of individuals data has become commonplace in many industries. Local differential privacy (LDP) is a rigorous approach to preserving data privacy even from a database administrator, unlike the more standard central differential privacy. To achieve LDP, one traditionally adds noise directly to each data dimension, but for high-dimensional data the level of noise required for sufficient anonymization all but entirely destroys the datas utility. In this paper, we introduce a novel LDP mechanism that leverages representation learning to overcome the prohibitive noise requirements of direct methods. We demonstrate that, rather than simply estimating aggregate statistics of the privatized data as is the norm in LDP applications, our method enables the training of performant machine learning models. Unique applications of our approach include private novel-class classification and the augmentation of clean datasets with additional privatized features. Methods that rely on central differential privacy are not applicable to such tasks. Our approach achieves significant performance gains on these tasks relative to state-of-the-art LDP benchmarks that noise data directly.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا