ﻻ يوجد ملخص باللغة العربية
Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e.g., `click vs `no-click). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., `click, `show me later, `never show again, `no click). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of $K+1geq 2$ possible outcomes (+1 stands for the `not click outcome): we assume that for a learners action $mathbf{x}_t$, the user selects one of $K+1geq 2$ outcomes, say outcome $i$, with a multinomial logit (MNL) probabilistic model with corresponding unknown parameter $bar{boldsymboltheta}_{ast i}$. Each outcome $i$ is also associated with a revenue parameter $rho_i$ and the goal is to maximize the expected revenue. For this problem, we present MNL-UCB, an upper confidence bound (UCB)-based algorithm, that achieves regret $tilde{mathcal{O}}(dKsqrt{T})$ with small dependency on problem-dependent constants that can otherwise be arbitrarily large and lead to loose regret bounds. We present numerical simulations that corroborate our theoretical results.
We present ADMM-Softmax, an alternating direction method of multipliers (ADMM) for solving multinomial logistic regression (MLR) problems. Our method is geared toward supervised classification tasks with many examples and features. It decouples the n
We propose a penalized likelihood method that simultaneously fits the multinomial logistic regression model and combines subsets of the response categories. The penalty is non differentiable when pairs of columns in the optimization variable are equa
In recent years, federated learning (FL) has been widely applied for supporting decentralized collaborative learning scenarios. Among existing FL models, federated logistic regression (FLR) is a widely used statistic model and has been used in variou
In this paper, we study the prediction of a real-valued target, such as a risk score or recidivism rate, while guaranteeing a quantitative notion of fairness with respect to a protected attribute such as gender or race. We call this class of problems
Cascading bandit (CB) is a popular model for web search and online advertising, where an agent aims to learn the $K$ most attractive items out of a ground set of size $L$ during the interaction with a user. However, the stationary CB model may be too