Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

97 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Daniel Y. Fu

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Daniel Y. Fu - Mayee F. Chen - Frederic Sala

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive, scaling superlinearly in the data. In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD). We use this insight to build FlyingSquid, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions. In particular, we prove bounds on generalization error without assuming that the latent variable model can exactly parameterize the underlying data distribution. Empirically, we validate FlyingSquid on benchmark weak supervision datasets and find that it achieves the same or higher quality compared to previous approaches without the need to tune an SGD procedure, recovers model parameters 170 times faster on average, and enables new video analysis and online learning applications.

قيم البحث

64 - Mayee F. Chen , Daniel Y. Fu , Frederic Sala 2020

Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervis ion (WS), wherein models can be trained with noisy sources of signal instead of hand-labeled data. But WS relies on training downstream deep networks to extrapolate to unseen data points, which can take hours or days. Pre-trained embeddings can remove this requirement. We do not use the embeddings as features as in transfer learning (TL), which requires fine-tuning for high performance, but instead use them to define a distance function on the data and extend WS source votes to nearby points. Theoretically, we provide a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard WS without extension and TL without fine-tuning. On six benchmark NLP and video tasks, our method outperforms WS without extension by 4.1 points, TL without fine-tuning by 12.8 points, and traditionally-supervised deep networks by 13.1 points, and comes within 0.7 points of state-of-the-art weakly-supervised deep networks-all while training in less than half a second.

التعلم الالي التعلم الآلي

Speeding Up Computers

76 - Janusz Kowalik n University of Gdansk 2016

There are two distinct approaches to speeding up large parallel computers. The older method is the General Purpose Graphics Processing Units (GPGPU). The newer is the Many Integrated Core (MIC) technology . Here we attempt to focus on the MIC technol ogy and point out differences between the two approaches to accelerating supercomputers. This is a user perspective.

النظم الموزعة والتوازية والحوسبة العنقودية

Speeding up HMC with better integrators

403 - M. A. Clark , A. D. Kennedy 2007

We discuss how dynamical fermion computations may be made yet cheaper by using symplectic integrators that conserve energy much more accurately without decreasing the integration step size. We first explain why symplectic integrators exactly conserve a ``shadow Hamiltonian close to the desired one, and how this Hamiltonian may be computed in terms of Poisson brackets. We then discuss how classical mechanics may be implemented on Lie groups and derive the form of the Poisson brackets and force terms for some interesting integrators such as those making use of second derivatives of the action (Hessian or force gradient integrators). We hope that these will be seen to greatly improve energy conservation for only a small additional cost and that their use will significantly reduce the cost of dynamical fermion computations.

فيزياء الطاقة العالية - شعرية

On the Global Convergence of (Fast) Incremental Expectation Maximization Methods

364 - Belhal Karimi , Hoi-To Wai , Eric Moulines 2019

The EM algorithm is one of the most popular algorithm for inference in latent data models. The original formulation of the EM algorithm does not scale to large data set, because the whole data set is required at each iteration of the algorithm. To al leviate this problem, Neal and Hinton have proposed an incremental version of the EM (iEM) in which at each iteration the conditional expectation of the latent data (E-step) is updated only for a mini-batch of observations. Another approach has been proposed by Cappe and Moulines in which the E-step is replaced by a stochastic approximation step, closely related to stochastic gradient. In this paper, we analyze incremental and stochastic version of the EM algorithm as well as the variance reduced-version of Chen et. al. in a common unifying framework. We also introduce a new version incremental version, inspired by the SAGA algorithm by Defazio et. al. We establish non-asymptotic convergence bounds for global convergence. Numerical applications are presented in this article to illustrate our findings.

التعلم الالي التعلم الآلي المنهجية

Personalized Activity Recognition with Deep Triplet Embeddings

106 - David M. Burns , Cari M. Whyne 2020

A significant challenge for a supervised learning approach to inertial human activity recognition is the heterogeneity of data between individual users, resulting in very poor performance of impersonal algorithms for some subjects. We present an appr oach to personalized activity recognition based on deep embeddings derived from a fully convolutional neural network. We experiment with both categorical cross entropy loss and triplet loss for training the embedding, and describe a novel triplet loss function based on subject triplets. We evaluate these methods on three publicly available inertial human activity recognition data sets (MHEALTH, WISDM, and SPAR) comparing classification accuracy, out-of-distribution activity detection, and embedding generalization to new activities. The novel subject triplet loss provides the best performance overall, and all personalized deep embeddings out-perform our baseline personalized engineered feature embedding and an impersonal fully convolutional neural network classifier.

التعلم الالي التعلم الآلي