ﻻ يوجد ملخص باللغة العربية
Noisy labeled data is more a norm than a rarity for self-generated content that is continuously published on the web and social media. Due to privacy concerns and governmental regulations, such a data stream can only be stored and used for learning purposes in a limited duration. To overcome the noise in this on-line scenario we propose QActor which novel combines: the selection of supposedly clean samples via quality models and actively querying an oracle for the most informative true labels. While the former can suffer from low data volumes of on-line scenarios, the latter is constrained by the availability and costs of human experts. QActor swiftly combines the merits of quality models for data filtering and oracle queries for cleaning the most informative data. The objective of QActor is to leverage the stringent oracle budget to robustly maximize the learning accuracy. QActor explores various strategies combining different query allocations and uncertainty measures. A central feature of QActor is to dynamically adjust the query limit according to the learning loss for each data batch. We extensively evaluate different image datasets fed into the classifier that can be standard machine learning (ML) models or deep neural networks (DNN) with noise label ratios ranging between 30% and 80%. Our results show that QActor can nearly match the optimal accuracy achieved using only clean data at the cost of at most an additional 6% of ground truth data from the oracle.
A similarity label indicates whether two instances belong to the same class while a class label shows the class of the instance. Without class labels, a multi-class classifier could be learned from similarity-labeled pairwise data by meta classificat
Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect. There exist many in
Active Learning is essential for more label-efficient deep learning. Bayesian Active Learning has focused on BALD, which reduces model parameter uncertainty. However, we show that BALD gets stuck on out-of-distribution or junk data that is not releva
Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data c
With the widespread use of machine learning for classification, it becomes increasingly important to be able to use weaker kinds of supervision for tasks in which it is hard to obtain standard labeled data. One such kind of supervision is provided pa