ترغب بنشر مسار تعليمي؟ اضغط هنا

Multi-Class Classification from Noisy-Similarity-Labeled Data

101   0   0.0 ( 0 )
 نشر من قبل Tongliang Liu
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

A similarity label indicates whether two instances belong to the same class while a class label shows the class of the instance. Without class labels, a multi-class classifier could be learned from similarity-labeled pairwise data by meta classification learning. However, since the similarity label is less informative than the class label, it is more likely to be noisy. Deep neural networks can easily remember noisy data, leading to overfitting in classification. In this paper, we propose a method for learning from only noisy-similarity-labeled data. Specifically, to model the noise, we employ a noise transition matrix to bridge the class-posterior probability between clean and noisy data. We further estimate the transition matrix from only noisy data and build a novel learning system to learn a classifier which can assign noise-free class labels for instances. Moreover, we theoretically justify how our proposed method generalizes for learning classifiers. Experimental results demonstrate the superiority of the proposed method over the state-of-the-art method on benchmark-simulated and real-world noisy-label datasets.

قيم البحث

اقرأ أيضاً

170 - Yuzhou Cao , Lei Feng , Senlin Shu 2021
Can we learn a multi-class classifier from only data of a single class? We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rig orous consistency guarantee when confidences (i.e., the class-posterior probabilities for all the classes) are available. Specifically, we propose an empirical risk minimization framework that is loss-/model-/optimizer-independent. Instead of constructing a boundary between the given class and other classes, our method can conduct discriminative classification between all the classes even if no data from the other classes are provided. We further theoretically and experimentally show that our method can be Bayes-consistent with a simple modification even if the provided confidences are highly noisy. Then, we provide an extension of our method for the case where data from a subset of all the classes are available. Experimental results demonstrate the effectiveness of our methods.
Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect. There exist many in expensive data sources on the web, but they tend to contain inaccurate labels. Training on noisy labeled datasets causes performance degradation because DNNs can easily overfit to the label noise. To overcome this problem, we propose a noise-tolerant training algorithm, where a meta-learning update is performed prior to conventional gradient update. The proposed meta-learning method simulates actual training by generating synthetic noisy labels, and train the model such that after one gradient update using each set of synthetic noisy labels, the model does not overfit to the specific noise. We conduct extensive experiments on the noisy CIFAR-10 dataset and the Clothing1M dataset. The results demonstrate the advantageous performance of the proposed method compared to several state-of-the-art baselines.
In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets harvested via the se means, sometimes resulting in entire classes of data on which learned classifiers generalize poorly. For real world applications, we argue that it can be beneficial to avoid training on such classes entirely. In this work, we aim to explore the classes in a given data set, and guide supervised training to spend time on a class proportional to its learnability. By focusing the training process, we aim to improve model generalization on classes with a strong signal. To that end, we develop an online algorithm that works in conjunction with classifier and training algorithm, iteratively selecting training data for the classifier based on how well it appears to generalize on each class. Testing our approach on a variety of data sets, we show our algorithm learns to focus on classes for which the model has low generalization error relative to strong baselines, yielding a classifier with good performance on learnable classes.
Noisy labeled data is more a norm than a rarity for self-generated content that is continuously published on the web and social media. Due to privacy concerns and governmental regulations, such a data stream can only be stored and used for learning p urposes in a limited duration. To overcome the noise in this on-line scenario we propose QActor which novel combines: the selection of supposedly clean samples via quality models and actively querying an oracle for the most informative true labels. While the former can suffer from low data volumes of on-line scenarios, the latter is constrained by the availability and costs of human experts. QActor swiftly combines the merits of quality models for data filtering and oracle queries for cleaning the most informative data. The objective of QActor is to leverage the stringent oracle budget to robustly maximize the learning accuracy. QActor explores various strategies combining different query allocations and uncertainty measures. A central feature of QActor is to dynamically adjust the query limit according to the learning loss for each data batch. We extensively evaluate different image datasets fed into the classifier that can be standard machine learning (ML) models or deep neural networks (DNN) with noise label ratios ranging between 30% and 80%. Our results show that QActor can nearly match the optimal accuracy achieved using only clean data at the cost of at most an additional 6% of ground truth data from the oracle.
With the dramatic increase of dimensions in the data representation, extracting latent low-dimensional features becomes of the utmost importance for efficient classification. Aiming at the problems of unclear margin representation and difficulty in r evealing the data manifold structure in most of the existing linear discriminant methods, we propose a new discriminant feature extraction framework, namely Robust Locality-Aware Regression (RLAR). In our model, we introduce a retargeted regression to perform the marginal representation learning adaptively instead of using the general average inter-class margin. Besides, we formulate a new strategy for enhancing the local intra-class compactness of the data manifold, which can achieve the joint learning of locality-aware graph structure and desirable projection matrix. To alleviate the disturbance of outliers and prevent overfitting, we measure the regression term and locality-aware term together with the regularization term by the L2,1 norm. Further, forcing the row sparsity on the projection matrix through the L2,1 norm achieves the cooperation of feature selection and feature extraction. Then, we derive an effective iterative algorithm for solving the proposed model. The experimental results over a range of UCI data sets and other benchmark databases demonstrate that the proposed RLAR outperforms some state-of-the-art approaches.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا