No Arabic abstract
Formal models of learning from teachers need to respect certain criteria to avoid collusion. The most commonly accepted notion of collusion-freeness was proposed by Goldman and Mathias (1996), and various teaching models obeying their criterion have been studied. For each model $M$ and each concept class $mathcal{C}$, a parameter $M$-$mathrm{TD}(mathcal{C})$ refers to the teaching dimension of concept class $mathcal{C}$ in model $M$---defined to be the number of examples required for teaching a concept, in the worst case over all concepts in $mathcal{C}$. This paper introduces a new model of teaching, called no-clash teaching, together with the corresponding parameter $mathrm{NCTD}(mathcal{C})$. No-clash teaching is provably optimal in the strong sense that, given any concept class $mathcal{C}$ and any model $M$ obeying Goldman and Mathiass collusion-freeness criterion, one obtains $mathrm{NCTD}(mathcal{C})le M$-$mathrm{TD}(mathcal{C})$. We also study a corresponding notion $mathrm{NCTD}^+$ for the case of learning from positive data only, establish useful bounds on $mathrm{NCTD}$ and $mathrm{NCTD}^+$, and discuss relations of these parameters to the VC-dimension and to sample compression. In addition to formulating an optimal model of collusion-free teaching, our main results are on the computational complexity of deciding whether $mathrm{NCTD}^+(mathcal{C})=k$ (or $mathrm{NCTD}(mathcal{C})=k$) for given $mathcal{C}$ and $k$. We show some such decision problems to be equivalent to the existence question for certain constrained matchings in bipartite graphs. Our NP-hardness results for the latter are of independent interest in the study of constrained graph matchings.
Iterative machine teaching is a method for selecting an optimal teaching example that enables a student to efficiently learn a target concept at each iteration. Existing studies on iterative machine teaching are based on supervised machine learning and assume that there are teachers who know the true answers of all teaching examples. In this study, we consider an unsupervised case where such teachers do not exist; that is, we cannot access the true answer of any teaching example. Students are given a teaching example at each iteration, but there is no guarantee if the corresponding label is correct. Recent studies on crowdsourcing have developed methods for estimating the true answers from crowdsourcing responses. In this study, we apply these to iterative machine teaching for estimating the true labels of teaching examples along with student models that are used for teaching. Our method supports the collaborative learning of students without teachers. The experimental results show that the teaching performance of our method is particularly effective for low-level students in particular.
Algorithmic machine teaching studies the interaction between a teacher and a learner where the teacher selects labeled examples aiming at teaching a target hypothesis. In a quest to lower teaching complexity, several teaching models and complexity measures have been proposed for both the batch settings (e.g., worst-case, recursive, preference-based, and non-clashing models) and the sequential settings (e.g., local preference-based model). To better understand the connections between these models, we develop a novel framework that captures the teaching process via preference functions $Sigma$. In our framework, each function $sigma in Sigma$ induces a teacher-learner pair with teaching complexity as $TD(sigma)$. We show that the above-mentioned teaching models are equivalent to specific types/families of preference functions. We analyze several properties of the teaching complexity parameter $TD(sigma)$ associated with different families of the preference functions, e.g., comparison to the VC dimension of the hypothesis class and additivity/sub-additivity of $TD(sigma)$ over disjoint domains. Finally, we identify preference functions inducing a novel family of sequential models with teaching complexity linear in the VC dimension: this is in contrast to the best-known complexity result for the batch models, which is quadratic in the VC dimension.
Recently, model-free reinforcement learning has attracted research attention due to its simplicity, memory and computation efficiency, and the flexibility to combine with function approximation. In this paper, we propose Exploration Enhanced Q-learning (EE-QL), a model-free algorithm for infinite-horizon average-reward Markov Decision Processes (MDPs) that achieves regret bound of $O(sqrt{T})$ for the general class of weakly communicating MDPs, where $T$ is the number of interactions. EE-QL assumes that an online concentrating approximation of the optimal average reward is available. This is the first model-free learning algorithm that achieves $O(sqrt T)$ regret without the ergodic assumption, and matches the lower bound in terms of $T$ except for logarithmic factors. Experiments show that the proposed algorithm performs as well as the best known model-based algorithms.
We present algorithms for efficiently learning regularizers that improve generalization. Our approach is based on the insight that regularizers can be viewed as upper bounds on the generalization gap, and that reducing the slack in the bound can improve performance on test data. For a broad class of regularizers, the hyperparameters that give the best upper bound can be computed using linear programming. Under certain Bayesian assumptions, solving the LP lets us jump to the optimal hyperparameters given very limited data. This suggests a natural algorithm for tuning regularization hyperparameters, which we show to be effective on both real and synthetic data.
The performance of a machine learning system is usually evaluated by using i.i.d. observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can be beneficial in such settings and can reduce the number of true labels required without compromising the evaluation accuracy. Stratified sampling exploits statistical properties (e.g., variance) across strata of the unlabeled population, though usually under the unrealistic assumption that these properties are known. We propose two new algorithms that simultaneously estimate these properties and optimize the evaluation accuracy. We construct a lower bound to show the proposed algorithms (to log-factors) are rate optimal. Experiments on synthetic and real data show the reduction in label complexity that is enabled by our algorithms.