No Arabic abstract
Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current robust loss functions, however, inevitably involve hyperparameter(s) to be tuned, manually or heuristically through cross validation, which makes them fairly hard to be generally applied in practice. Besides, the non-convexity brought by the loss as well as the complicated network architecture makes it easily trapped into an unexpected solution with poor generalization capability. To address above issues, we propose a meta-learning method capable of adaptively learning hyperparameter in robust loss functions. Specifically, through mutual amelioration between robust loss hyperparameter and network parameters in our method, both of them can be simultaneously finely learned and coordinated to attain solutions with good generalization capability. Four kinds of SOTA robust loss functions are attempted to be integrated into our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its accuracy and generalization performance, as compared with conventional hyperparameter tuning strategy, even with carefully tuned hyperparameters.
Learning with curriculum has shown great effectiveness in tasks where the data contains noisy (corrupted) labels, since the curriculum can be used to re-weight or filter out noisy samples via proper design. However, obtaining curriculum from a learner itself without additional supervision or feedback deteriorates the effectiveness due to sample selection bias. Therefore, methods that involve two or more networks have been recently proposed to mitigate such bias. Nevertheless, these studies utilize the collaboration between networks in a way that either emphasizes the disagreement or focuses on the agreement while ignores the other. In this paper, we study the underlying mechanism of how disagreement and agreement between networks can help reduce the noise in gradients and develop a novel framework called Robust Collaborative Learning (RCL) that leverages both disagreement and agreement among networks. We demonstrate the effectiveness of RCL on both synthetic benchmark image data and real-world large-scale bioinformatics data.
Robust loss functions are essential for training deep neural networks with better generalization power in the presence of noisy labels. Symmetric loss functions are confirmed to be robust to label noise. However, the symmetric condition is overly restrictive. In this work, we propose a new class of loss functions, namely textit{asymmetric loss functions}, which are robust to learning with noisy labels for various types of noise. We investigate general theoretical properties of asymmetric loss functions, including classification calibration, excess risk bound, and noise tolerance. Meanwhile, we introduce the asymmetry ratio to measure the asymmetry of a loss function. The empirical results show that a higher ratio would provide better noise tolerance. Moreover, we modify several commonly-used loss functions and establish the necessary and sufficient conditions for them to be asymmetric. Experimental results on benchmark datasets demonstrate that asymmetric loss functions can outperform state-of-the-art methods. The code is available at href{https://github.com/hitcszx/ALFs}{https://github.com/hitcszx/ALFs}
Recently deep neural networks have shown their capacity to memorize training data, even with noisy labels, which hurts generalization performance. To mitigate this issue, we provide a simple but effective baseline method that is robust to noisy labels, even with severe noise. Our objective involves a variance regularization term that implicitly penalizes the Jacobian norm of the neural network on the whole training set (including the noisy-labeled data), which encourages generalization and prevents overfitting to the corrupted labels. Experiments on both synthetically generated incorrect labels and realistic large-scale noisy datasets demonstrate that our approach achieves state-of-the-art performance with a high tolerance to severe noise.
Deep Learning systems have shown tremendous accuracy in image classification, at the cost of big image datasets. Collecting such amounts of data can lead to labelling errors in the training set. Indexing multimedia content for retrieval, classification or recommendation can involve tagging or classification based on multiple criteria. In our case, we train face recognition systems for actors identification with a closed set of identities while being exposed to a significant number of perturbators (actors unknown to our database). Face classifiers are known to be sensitive to label noise. We review recent works on how to manage noisy annotations when training deep learning classifiers, independently from our interest in face recognition.
Recent studies on the memorization effects of deep neural networks on noisy labels show that the networks first fit the correctly-labeled training samples before memorizing the mislabeled samples. Motivated by this early-learning phenomenon, we propose a novel method to prevent memorization of the mislabeled samples. Unlike the existing approaches which use the model output to identify or ignore the mislabeled samples, we introduce an indicator branch to the original model and enable the model to produce a confidence value for each sample. The confidence values are incorporated in our loss function which is learned to assign large confidence values to correctly-labeled samples and small confidence values to mislabeled samples. We also propose an auxiliary regularization term to further improve the robustness of the model. To improve the performance, we gradually correct the noisy labels with a well-designed target estimation strategy. We provide the theoretical analysis and conduct the experiments on synthetic and real-world datasets, demonstrating that our approach achieves comparable results to the state-of-the-art methods.