Do you want to publish a course? Click here

Anytime Tail Averaging

103   0   0.0 ( 0 )
 Added by Nicolas Le Roux
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Tail averaging consists in averaging the last examples in a stream. Common techniques either have a memory requirement which grows with the number of samples to average, are not available at every timestep or do not accomodate growing windows. We propose two techniques with a low constant memory cost that perform tail averaging with access to the average at every time step. We also show how one can improve the accuracy of that average at the cost of increased memory consumption.



rate research

Read More

204 - Lucas Caccia , Jing Xu , Myle Ott 2021
Classical machine learning frameworks assume access to a possibly large dataset in order to train a predictive model. In many practical applications however, data does not arrive all at once, but in batches over time. This creates a natural trade-off between accuracy of a model and time to obtain such a model. A greedy predictor could produce non-trivial predictions by immediately training on batches as soon as these become available but, it may also make sub-optimal use of future data. On the other hand, a tardy predictor could wait for a long time to aggregate several batches into a larger dataset, but ultimately deliver a much better performance. In this work, we consider such a streaming learning setting, which we dub {em anytime learning at macroscale} (ALMA). It is an instance of anytime learning applied not at the level of a single chunk of data, but at the level of the entire sequence of large batches. We first formalize this learning setting, we then introduce metrics to assess how well learners perform on the given task for a given memory and compute budget, and finally we test several baseline approaches on standard benchmarks repurposed for anytime learning at macroscale. The general finding is that bigger models always generalize better. In particular, it is important to grow model capacity over time if the initial model is relatively small. Moreover, updating the model at an intermediate rate strikes the best trade off between accuracy and time to obtain a useful predictor.
Modern genomic studies are increasingly focused on discovering more and more interesting genes associated with a health response. Traditional shrinkage priors are primarily designed to detect a handful of signals from tens and thousands of predictors. Under diverse sparsity regimes, the nature of signal detection is associated with a tail behaviour of a prior. A desirable tail behaviour is called tail-adaptive shrinkage property where tail-heaviness of a prior gets adaptively larger (or smaller) as a sparsity level increases (or decreases) to accommodate more (or less) signals. We propose a global-local-tail (GLT) Gaussian mixture distribution to ensure this property and provide accurate inference under diverse sparsity regimes. Incorporating a peaks-over-threshold method in extreme value theory, we develop an automated tail learning algorithm for the GLT prior. We compare the performance of the GLT prior to the Horseshoe in two gene expression datasets and numerical examples. Results suggest that varying tail rule is advantageous over fixed tail rule under diverse sparsity domains.
87 - Guokun Chi , Min Jiang , Xing Gao 2019
Transfer learning techniques have been widely used in the reality that it is difficult to obtain sufficient labeled data in the target domain, but a large amount of auxiliary data can be obtained in the relevant source domain. But most of the existing methods are based on offline data. In practical applications, it is often necessary to face online learning problems in which the data samples are achieved sequentially. In this paper, We are committed to applying the ensemble approach to solving the problem of online transfer learning so that it can be used in anytime setting. More specifically, we propose a novel online transfer learning framework, which applies the idea of online bagging methods to anytime transfer learning problems, and constructs strong classifiers through online iterations of the usefulness of multiple weak classifiers. Further, our algorithm also provides two extension schemes to reduce the impact of negative transfer. Experiments on three real data sets show that the effectiveness of our proposed algorithms.
156 - Yilun Xu , Yang Song , Sahaj Garg 2021
Autoregressive models are widely used for tasks such as image and audio generation. The sampling process of these models, however, does not allow interruptions and cannot adapt to real-time computational resources. This challenge impedes the deployment of powerful autoregressive models, which involve a slow sampling process that is sequential in nature and typically scales linearly with respect to the data dimension. To address this difficulty, we propose a new family of autoregressive models that enables anytime sampling. Inspired by Principal Component Analysis, we learn a structured representation space where dimensions are ordered based on their importance with respect to reconstruction. Using an autoregressive model in this latent space, we trade off sample quality for computational efficiency by truncating the generation process before decoding into the original data space. Experimentally, we demonstrate in several image and audio generation tasks that sample quality degrades gracefully as we reduce the computational budget for sampling. The approach suffers almost no loss in sample quality (measured by FID) using only 60% to 80% of all latent dimensions for image data. Code is available at https://github.com/Newbeeer/Anytime-Auto-Regressive-Model .
Understanding the shape of a distribution of data is of interest to people in a great variety of fields, as it may affect the types of algorithms used for that data. Given samples from a distribution, we seek to understand how many elements appear infrequently, that is, to characterize the tail of the distribution. We develop an algorithm based on a careful bucketing scheme that distinguishes heavy-tailed distributions from non-heavy-tailed ones via a definition based on the hazard rate under some natural smoothness and ordering assumptions. We verify our theoretical results empirically.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا