ﻻ يوجد ملخص باللغة العربية
Labelled data often comes at a high cost as it may require recruiting human labelers or running costly experiments. At the same time, in many practical scenarios, one already has access to a partially labelled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of $semi-supervised$ $active$ $learning$ through the frame of linear regression. In this setting, the learner has access to a dataset $X in mathbb{R}^{(n_1+n_2) times d}$ which is composed of $n_1$ unlabelled examples that an algorithm can actively query, and $n_2$ examples labelled a-priori. Concretely, denoting the true labels by $Y in mathbb{R}^{n_1 + n_2}$, the learners objective is to find $widehat{beta} in mathbb{R}^d$ such that, begin{equation} | X widehat{beta} - Y |_2^2 le (1 + epsilon) min_{beta in mathbb{R}^d} | X beta - Y |_2^2 end{equation} while making as few additional label queries as possible. In order to bound the label queries, we introduce an instance dependent parameter called the reduced rank, denoted by $R_X$, and propose an efficient algorithm with query complexity $O(R_X/epsilon)$. This result directly implies improved upper bounds for two important special cases: (i) active ridge regression, and (ii) active kernel ridge regression, where the reduced-rank equates to the statistical dimension, $sd_lambda$ and effective dimension, $d_lambda$ of the problem respectively, where $lambda ge 0$ denotes the regularization parameter. For active ridge regression we also prove a matching lower bound of $O(sd_lambda / epsilon)$ on the query complexity of any algorithm. This subsumes prior work that only considered the unregularized case, i.e., $lambda = 0$.
Twin neural network regression (TNNR) is a semi-supervised regression algorithm, it can be trained on unlabelled data points as long as other, labelled anchor data points, are present. TNNR is trained to predict differences between the target values
Ordinal regression is aimed at predicting an ordinal class label. In this paper, we consider its semi-supervised formulation, in which we have unlabeled data along with ordinal-labeled data to train an ordinal regressor. There are several metrics to
Active learning is an effective technique for reducing the labeling cost by improving data efficiency. In this work, we propose a novel batch acquisition strategy for active learning in the setting where the model training is performed in a semi-supe
Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance. In pool-based active learning, accessible unlabeled data are not us
Active learning is widely used to reduce labeling effort and training time by repeatedly querying only the most beneficial samples from unlabeled data. In real-world problems where data cannot be stored indefinitely due to limited storage or privacy