ﻻ يوجد ملخص باللغة العربية
We consider the bandit problem of selecting $K$ out of $N$ arms at each time step. The reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among $binom{N}{K}$ options, making the action space large. To simplify the problem, existing works on combinatorial bandits {typically} assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-$K$ subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds. DART is computationally efficient and uses storage linear in $N$. Further, DART achieves a regret bound of $tilde{mathcal{O}}(Ksqrt{KNT})$ for a time horizon $T$, which matches the lower bound in bandit feedback up to a factor of $sqrt{log{2NT}}$. When applied to the problem of cross-selling optimization and maximizing the mean of individual rewards, the performance of the proposed algorithm surpasses that of state-of-the-art algorithms. We also show that DART significantly outperforms existing methods for both linear and non-linear joint reward environments.
Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint. Unfortunately, most e
In the classical Subset Sum problem we are given a set $X$ and a target $t$, and the task is to decide whether there exists a subset of $X$ which sums to $t$. A recent line of research has resulted in $tilde{O}(t)$-time algorithms, which are (near-)o
The top-k operation, i.e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is i
Active learning is an important technique to reduce the number of labeled examples in supervised learning. Active learning for binary classification has been well addressed in machine learning. However, active learning of the reject option classifier
In this paper, we propose an approach for learning sparse reject option classifiers using double ramp loss $L_{dr}$. We use DC programming to find the risk minimizer. The algorithm solves a sequence of linear programs to learn the reject option class