ﻻ يوجد ملخص باللغة العربية
We consider a binary classification problem when the data comes from a mixture of two rotationally symmetric distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant $C_{mathrm{err}}>0$ such that if a pseudolabeler $boldsymbol{beta}_{mathrm{pl}}$ can achieve classification error at most $C_{mathrm{err}}$, then for any $varepsilon>0$, an iterative self-training algorithm initialized at $boldsymbol{beta}_0 := boldsymbol{beta}_{mathrm{pl}}$ using pseudolabels $hat y = mathrm{sgn}(langle boldsymbol{beta}_t, mathbf{x}rangle)$ and using at most $tilde O(d/varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $varepsilon$ error, where $d$ is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbol{beta}_{mathrm{pl}}$ with classification error $C_{mathrm{err}}$ using only $O(d)$ labeled examples (i.e., independent of $varepsilon$). Together our results imply that mixture models can be learned to within $varepsilon$ of the Bayes-optimal accuracy using at most $O(d)$ labeled examples and $tilde O(d/varepsilon^2)$ unlabeled examples by way of a semi-supervised self-training algorithm.
One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contra
The success of minimax learning problems of generative adversarial networks (GANs) has been observed to depend on the minimax optimization algorithm used for their training. This dependence is commonly attributed to the convergence speed and robustne
A meta-model is trained on a distribution of similar tasks such that it learns an algorithm that can quickly adapt to a novel task with only a handful of labeled examples. Most of current meta-learning methods assume that the meta-training set consis
Meta-Learning (ML) has proven to be a useful tool for training Few-Shot Learning (FSL) algorithms by exposure to batches of tasks sampled from a meta-dataset. However, the standard training procedure overlooks the dynamic nature of the real-world whe
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-speci