ﻻ يوجد ملخص باللغة العربية
Multi-label classification tasks such as OCR and multi-object recognition are a major focus of the growing machine learning as a service industry. While many multi-label prediction APIs are available, it is challenging for users to decide which API to use for their own data and budget, due to the heterogeneity in those APIs price and performance. Recent work shows how to select from single-label prediction APIs. However the computation complexity of the previous approach is exponential in the number of labels and hence is not suitable for settings like OCR. In this work, we propose FrugalMCT, a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting users budget. The API selection problem is cast as an integer linear program, which we show has a special structure that we leverage to develop an efficient online API selector with strong performance guarantees. We conduct systematic experiments using ML APIs from Google, Microsoft, Amazon, IBM, Tencent and other providers for tasks including multi-label image classification, scene text recognition and named entity recognition. Across diverse tasks, FrugalMCT can achieve over 90% cost reduction while matching the accuracy of the best single API, or up to 8% better accuracy while matching the best APIs cost.
One of the key problems in multi-label text classification is how to take advantage of the correlation among labels. However, it is very challenging to directly model the correlations among labels in a complex and unknown label space. In this paper,
Multi-label classification (MLC) studies the problem where each instance is associated with multiple relevant labels, which leads to the exponential growth of output space. MLC encourages a popular framework named label compression (LC) for capturing
Perhaps surprisingly sewerage infrastructure is one of the most costly infrastructures in modern society. Sewer pipes are manually inspected to determine whether the pipes are defective. However, this process is limited by the number of qualified ins
We find that the way we choose to represent data labels can have a profound effect on the quality of trained models. For example, training an image classifier to regress audio labels rather than traditional categorical probabilities produces a more r
Extreme Multi-label classification (XML) is an important yet challenging machine learning task, that assigns to each instance its most relevant candidate labels from an extremely large label collection, where the numbers of labels, features and insta