ﻻ يوجد ملخص باللغة العربية
We consider a discounted infinite horizon optimal stopping problem. If the underlying distribution is known a priori, the solution of this problem is obtained via dynamic programming (DP) and is given by a well known threshold rule. When information on this distribution is lacking, a natural (though naive) approach is explore-then-exploit, whereby the unknown distribution or its parameters are estimated over an initial exploration phase, and this estimate is then used in the DP to determine actions over the residual exploitation phase. We show: (i) with proper tuning, this approach leads to performance comparable to the full information DP solution; and (ii) despite common wisdom on the sensitivity of such plug in approaches in DP due to propagation of estimation errors, a surprisingly short (logarithmic in the horizon) exploration horizon suffices to obtain said performance. In cases where the underlying distribution is heavy-tailed, these observations are even more pronounced: a ${it single , sample}$ exploration phase suffices.
There is a recent surge of interest in designing deep architectures based on the update steps in traditional algorithms, or learning neural networks to improve and replace traditional algorithms. While traditional algorithms have certain stopping cri
How can you sample good negative examples for contrastive learning? We argue that, as with metric learning, contrastive learning of representations benefits from hard negative samples (i.e., points that are difficult to distinguish from an anchor poi
Recent few-shot learning works focus on training a model with prior meta-knowledge to fast adapt to new tasks with unseen classes and samples. However, conventional time-series classification algorithms fail to tackle the few-shot scenario. Existing
Current machine learning has made great progress on computer vision and many other fields attributed to the large amount of high-quality training samples, while it does not work very well on genomic data analysis, since they are notoriously known as
We consider the transfer of experience samples (i.e., tuples < s, a, s, r >) in reinforcement learning (RL), collected from a set of source tasks to improve the learning process in a given target task. Most of the related approaches focus on selectin