Noisy Labels Can Induce Good Representations

120 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jingling Li

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jingling Li - Mozhi Zhang - Keyulu Xu

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The current success of deep learning depends on large-scale labeled datasets. In practice, high-quality annotations are expensive to collect, but noisy annotations are more affordable. Previous works report mixed empirical results when training with noisy labels: neural networks can easily memorize random labels, but they can also generalize from noisy labels. To explain this puzzle, we study how architecture affects learning with noisy labels. We observe that if an architecture suits the task, training with noisy labels can induce useful hidden representations, even when the model generalizes poorly; i.e., the last few layers of the model are more negatively affected by noisy labels. This finding leads to a simple method to improve models trained on noisy labels: replacing the final dense layers with a linear model, whose weights are learned from a small set of clean data. We empirically validate our findings across three architectures (Convolutional Neural Networks, Graph Neural Networks, and Multi-Layer Perceptrons) and two domains (graph algorithmic tasks and image classification). Furthermore, we achieve state-of-the-art results on image classification benchmarks by combining our method with existing approaches on noisy label training.

قيم البحث

142 - Xiong Zhou , Xianming Liu , Chenyang Wang 2021

Learning with noisy labels is an important and challenging task for training accurate deep neural networks. Some commonly-used loss functions, such as Cross Entropy (CE), suffer from severe overfitting to noisy labels. Robust loss functions that sati sfy the symmetric condition were tailored to remedy this problem, which however encounter the underfitting effect. In this paper, we theoretically prove that textbf{any loss can be made robust to noisy labels} by restricting the network output to the set of permutations over a fixed vector. When the fixed vector is one-hot, we only need to constrain the output to be one-hot, which however produces zero gradients almost everywhere and thus makes gradient-based optimization difficult. In this work, we introduce the sparse regularization strategy to approximate the one-hot constraint, which is composed of network output sharpening operation that enforces the output distribution of a network to be sharp and the $ell_p$-norm ($ple 1$) regularization that promotes the network output to be sparse. This simple approach guarantees the robustness of arbitrary loss functions while not hindering the fitting ability. Experimental results demonstrate that our method can significantly improve the performance of commonly-used loss functions in the presence of noisy labels and class imbalance, and outperform the state-of-the-art methods. The code is available at https://github.com/hitcszx/lnl_sr.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Learning Representations For Images With Hierarchical Labels

111 - Ankit Dhall 2020

Image classification has been studied extensively but there has been limited work in the direction of using non-conventional, external guidance other than traditional image-label pairs to train such models. In this thesis we present a set of methods to leverage information about the semantic hierarchy induced by class labels. In the first part of the thesis, we inject label-hierarchy knowledge to an arbitrary classifier and empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance. Taking a step further in this direction, we model more explicitly the label-label and label-image interactions by using order-preserving embedding-based models, prevalent in natural language, and tailor them to the domain of computer vision to perform image classification. Although, contrasting in nature, both the CNN-classifiers injected with hierarchical information, and the embedding-based models outperform a hierarchy-agnostic model on the newly presented, real-world ETH Entomological Collection image dataset https://www.research-collection.ethz.ch/handle/20.500.11850/365379.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels

87 - Lu Jiang , Di Huang , Mason Liu 2019

Performing controlled experiments on noisy data is essential in understanding deep learning across noise levels. Due to the lack of suitable datasets, previous research has only examined deep learning on controlled synthetic label noise, and real-wor ld label noise has never been studied in a controlled setting. This paper makes three contributions. First, we establish the first benchmark of controlled real-world label noise from the web. This new benchmark enables us to study the web label noise in a controlled setting for the first time. The second contribution is a simple but effective method to overcome both synthetic and real noisy labels. We show that our method achieves the best result on our dataset as well as on two public benchmarks (CIFAR and WebVision). Third, we conduct the largest study by far into understanding deep neural networks trained on noisy labels across different noise levels, noise types, network architectures, and training settings. The data and code are released at the following link: http://www.lujiang.info/cnlw.html

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Learning to Combat Noisy Labels via Classification Margins

259 - Jason Z. Lin , Jelena Bradic 2021

A deep neural network trained on noisy labels is known to quickly lose its power to discriminate clean instances from noisy ones. After the early learning phase has ended, the network memorizes the noisy instances, which leads to a significant degrad ation in its generalization performance. To resolve this issue, we propose MARVEL (MARgins Via Early Learning), a new robust learning method where the memorization of the noisy instances is curbed. We propose a new test statistic that tracks the goodness of fit of every instance based on the epoch-history of its classification margins. If its classification margin is small in a sequence of consecutive learning epochs, that instance is declared noisy and the network abandons learning on it. Consequently, the network first flags a possibly noisy instance, and then waits to see if learning on that instance can be improved and if not, the network learns with confidence that this instance can be safely abandoned. We also propose MARVEL+, where arduous instances can be upweighted, enabling the network to focus and improve its learning on them and consequently its generalization. Experimental results on benchmark datasets with synthetic label noise and real-world datasets show that MARVEL outperforms other baselines consistently across different noise levels, with a significantly larger margin under asymmetric noise.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط المنهجية

Asymmetric Loss Functions for Learning with Noisy Labels

155 - Xiong Zhou , Xianming Liu , Junjun Jiang 2021

Robust loss functions are essential for training deep neural networks with better generalization power in the presence of noisy labels. Symmetric loss functions are confirmed to be robust to label noise. However, the symmetric condition is overly res trictive. In this work, we propose a new class of loss functions, namely textit{asymmetric loss functions}, which are robust to learning with noisy labels for various types of noise. We investigate general theoretical properties of asymmetric loss functions, including classification calibration, excess risk bound, and noise tolerance. Meanwhile, we introduce the asymmetry ratio to measure the asymmetry of a loss function. The empirical results show that a higher ratio would provide better noise tolerance. Moreover, we modify several commonly-used loss functions and establish the necessary and sufficient conditions for them to be asymmetric. Experimental results on benchmark datasets demonstrate that asymmetric loss functions can outperform state-of-the-art methods. The code is available at href{https://github.com/hitcszx/ALFs}{https://github.com/hitcszx/ALFs}

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط