Do you want to publish a course? Click here

Semi-supervised Batch Active Learning via Bilevel Optimization

171   0   0.0 ( 0 )
 Added by Zal\\'an Borsos
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Active learning is an effective technique for reducing the labeling cost by improving data efficiency. In this work, we propose a novel batch acquisition strategy for active learning in the setting where the model training is performed in a semi-supervised manner. We formulate our approach as a data summarization problem via bilevel optimization, where the queried batch consists of the points that best summarize the unlabeled data pool. We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.



rate research

Read More

Semi-Supervised Learning (SSL) has achieved great success in overcoming the difficulties of labeling and making full use of unlabeled data. However, SSL has a limited assumption that the numbers of samples in different classes are balanced, and many SSL algorithms show lower performance for the datasets with the imbalanced class distribution. In this paper, we introduce a task of class-imbalanced semi-supervised learning (CISSL), which refers to semi-supervised learning with class-imbalanced data. In doing so, we consider class imbalance in both labeled and unlabeled sets. First, we analyze existing SSL methods in imbalanced environments and examine how the class imbalance affects SSL methods. Then we propose Suppressed Consistency Loss (SCL), a regularization method robust to class imbalance. Our method shows better performance than the conventional methods in the CISSL environment. In particular, the more severe the class imbalance and the smaller the size of the labeled data, the better our method performs.
Graph convolutional neural networks~(GCNs) have recently demonstrated promising results on graph-based semi-supervised classification, but little work has been done to explore their theoretical properties. Recently, several deep neural networks, e.g., fully connected and convolutional neural networks, with infinite hidden units have been proved to be equivalent to Gaussian processes~(GPs). To exploit both the powerful representational capacity of GCNs and the great expressive power of GPs, we investigate similar properties of infinitely wide GCNs. More specifically, we propose a GP regression model via GCNs~(GPGC) for graph-based semi-supervised learning. In the process, we formulate the kernel matrix computation of GPGC in an iterative analytical form. Finally, we derive a conditional distribution for the labels of unobserved nodes based on the graph structure, labels for the observed nodes, and the feature matrix of all the nodes. We conduct extensive experiments to evaluate the semi-supervised classification performance of GPGC and demonstrate that it outperforms other state-of-the-art methods by a clear margin on all the datasets while being efficient.
Labelled data often comes at a high cost as it may require recruiting human labelers or running costly experiments. At the same time, in many practical scenarios, one already has access to a partially labelled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of $semi-supervised$ $active$ $learning$ through the frame of linear regression. In this setting, the learner has access to a dataset $X in mathbb{R}^{(n_1+n_2) times d}$ which is composed of $n_1$ unlabelled examples that an algorithm can actively query, and $n_2$ examples labelled a-priori. Concretely, denoting the true labels by $Y in mathbb{R}^{n_1 + n_2}$, the learners objective is to find $widehat{beta} in mathbb{R}^d$ such that, begin{equation} | X widehat{beta} - Y |_2^2 le (1 + epsilon) min_{beta in mathbb{R}^d} | X beta - Y |_2^2 end{equation} while making as few additional label queries as possible. In order to bound the label queries, we introduce an instance dependent parameter called the reduced rank, denoted by $R_X$, and propose an efficient algorithm with query complexity $O(R_X/epsilon)$. This result directly implies improved upper bounds for two important special cases: (i) active ridge regression, and (ii) active kernel ridge regression, where the reduced-rank equates to the statistical dimension, $sd_lambda$ and effective dimension, $d_lambda$ of the problem respectively, where $lambda ge 0$ denotes the regularization parameter. For active ridge regression we also prove a matching lower bound of $O(sd_lambda / epsilon)$ on the query complexity of any algorithm. This subsumes prior work that only considered the unregularized case, i.e., $lambda = 0$.
102 - Kaiyi Ji 2021
Bilevel optimization has become a powerful framework in various machine learning applications including meta-learning, hyperparameter optimization, and network architecture search. There are generally two classes of bilevel optimization formulations for machine learning: 1) problem-based bilevel optimization, whose inner-level problem is formulated as finding a minimizer of a given loss function; and 2) algorithm-based bilevel optimization, whose inner-level solution is an output of a fixed algorithm. For the first class, two popular types of gradient-based algorithms have been proposed for hypergradient estimation via approximate implicit differentiation (AID) and iterative differentiation (ITD). Algorithms for the second class include the popular model-agnostic meta-learning (MAML) and almost no inner loop (ANIL). However, the convergence rate and fundamental limitations of bilevel optimization algorithms have not been well explored. This thesis provides a comprehensive convergence rate analysis for bilevel algorithms in the aforementioned two classes. We further propose principled algorithm designs for bilevel optimization with higher efficiency and scalability. For the problem-based formulation, we provide a convergence rate analysis for AID- and ITD-based bilevel algorithms. We then develop acceleration bilevel algorithms, for which we provide shaper convergence analysis with relaxed assumptions. We also provide the first lower bounds for bilevel optimization, and establish the optimality by providing matching upper bounds under certain conditions. We finally propose new stochastic bilevel optimization algorithms with lower complexity and higher efficiency in practice. For the algorithm-based formulation, we develop a theoretical convergence for general multi-step MAML and ANIL, and characterize the impact of parameter selections and loss geometries on the their complexities.
The objective of active learning (AL) is to train classification models with less number of labeled instances by selecting only the most informative instances for labeling. The AL algorithms designed for other data types such as images and text do not perform well on graph-structured data. Although a few heuristics-based AL algorithms have been proposed for graphs, a principled approach is lacking. In this paper, we propose MetAL, an AL approach that selects unlabeled instances that directly improve the future performance of a classification model. For a semi-supervised learning problem, we formulate the AL task as a bilevel optimization problem. Based on recent work in meta-learning, we use the meta-gradients to approximate the impact of retraining the model with any unlabeled instance on the model performance. Using multiple graph datasets belonging to different domains, we demonstrate that MetAL efficiently outperforms existing state-of-the-art AL algorithms.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا