Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

97 0 0.0 ( 0 )

Download Cite

Added by Daniel Y. Fu

Publication date 2020

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Daniel Y. Fu - Mayee F. Chen - Frederic Sala

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive, scaling superlinearly in the data. In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD). We use this insight to build FlyingSquid, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions. In particular, we prove bounds on generalization error without assuming that the latent variable model can exactly parameterize the underlying data distribution. Empirically, we validate FlyingSquid on benchmark weak supervision datasets and find that it achieves the same or higher quality compared to previous approaches without the need to tune an SGD procedure, recovers model parameters 170 times faster on average, and enables new video analysis and online learning applications.

rate research

Train and Youll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings

64 - Mayee F. Chen , Daniel Y. Fu , Frederic Sala 2020

Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervision (WS), wherein models can be trained with noisy sources of signal instead of hand-labeled data. But WS relies on training downstream deep networks to extrapolate to unseen data points, which can take hours or days. Pre-trained embeddings can remove this requirement. We do not use the embeddings as features as in transfer learning (TL), which requires fine-tuning for high performance, but instead use them to define a distance function on the data and extend WS source votes to nearby points. Theoretically, we provide a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard WS without extension and TL without fine-tuning. On six benchmark NLP and video tasks, our method outperforms WS without extension by 4.1 points, TL without fine-tuning by 12.8 points, and traditionally-supervised deep networks by 13.1 points, and comes within 0.7 points of state-of-the-art weakly-supervised deep networks-all while training in less than half a second.

Machine Learning Machine Learning

Speeding Up Computers

76 - Janusz Kowalik n University of Gdansk 2016

There are two distinct approaches to speeding up large parallel computers. The older method is the General Purpose Graphics Processing Units (GPGPU). The newer is the Many Integrated Core (MIC) technology . Here we attempt to focus on the MIC technology and point out differences between the two approaches to accelerating supercomputers. This is a user perspective.

Distributed Parallel and Cluster Computing

Speeding up HMC with better integrators

190 - M. A. Clark , A. D. Kennedy 2007

We discuss how dynamical fermion computations may be made yet cheaper by using symplectic integrators that conserve energy much more accurately without decreasing the integration step size. We first explain why symplectic integrators exactly conserve a ``shadow Hamiltonian close to the desired one, and how this Hamiltonian may be computed in terms of Poisson brackets. We then discuss how classical mechanics may be implemented on Lie groups and derive the form of the Poisson brackets and force terms for some interesting integrators such as those making use of second derivatives of the action (Hessian or force gradient integrators). We hope that these will be seen to greatly improve energy conservation for only a small additional cost and that their use will significantly reduce the cost of dynamical fermion computations.

High Energy Physics - Lattice

On the Global Convergence of (Fast) Incremental Expectation Maximization Methods

364 - Belhal Karimi , Hoi-To Wai , Eric Moulines 2019

The EM algorithm is one of the most popular algorithm for inference in latent data models. The original formulation of the EM algorithm does not scale to large data set, because the whole data set is required at each iteration of the algorithm. To alleviate this problem, Neal and Hinton have proposed an incremental version of the EM (iEM) in which at each iteration the conditional expectation of the latent data (E-step) is updated only for a mini-batch of observations. Another approach has been proposed by Cappe and Moulines in which the E-step is replaced by a stochastic approximation step, closely related to stochastic gradient. In this paper, we analyze incremental and stochastic version of the EM algorithm as well as the variance reduced-version of Chen et. al. in a common unifying framework. We also introduce a new version incremental version, inspired by the SAGA algorithm by Defazio et. al. We establish non-asymptotic convergence bounds for global convergence. Numerical applications are presented in this article to illustrate our findings.

Machine Learning Machine Learning Methodology

Personalized Activity Recognition with Deep Triplet Embeddings

106 - David M. Burns , Cari M. Whyne 2020

A significant challenge for a supervised learning approach to inertial human activity recognition is the heterogeneity of data between individual users, resulting in very poor performance of impersonal algorithms for some subjects. We present an approach to personalized activity recognition based on deep embeddings derived from a fully convolutional neural network. We experiment with both categorical cross entropy loss and triplet loss for training the embedding, and describe a novel triplet loss function based on subject triplets. We evaluate these methods on three publicly available inertial human activity recognition data sets (MHEALTH, WISDM, and SPAR) comparing classification accuracy, out-of-distribution activity detection, and embedding generalization to new activities. The novel subject triplet loss provides the best performance overall, and all personalized deep embeddings out-perform our baseline personalized engineered feature embedding and an impersonal fully convolutional neural network classifier.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions