ترغب بنشر مسار تعليمي؟ اضغط هنا

On Anytime Learning at Macroscale

205   0   0.0 ( 0 )
 نشر من قبل Lucas Caccia
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Classical machine learning frameworks assume access to a possibly large dataset in order to train a predictive model. In many practical applications however, data does not arrive all at once, but in batches over time. This creates a natural trade-off between accuracy of a model and time to obtain such a model. A greedy predictor could produce non-trivial predictions by immediately training on batches as soon as these become available but, it may also make sub-optimal use of future data. On the other hand, a tardy predictor could wait for a long time to aggregate several batches into a larger dataset, but ultimately deliver a much better performance. In this work, we consider such a streaming learning setting, which we dub {em anytime learning at macroscale} (ALMA). It is an instance of anytime learning applied not at the level of a single chunk of data, but at the level of the entire sequence of large batches. We first formalize this learning setting, we then introduce metrics to assess how well learners perform on the given task for a given memory and compute budget, and finally we test several baseline approaches on standard benchmarks repurposed for anytime learning at macroscale. The general finding is that bigger models always generalize better. In particular, it is important to grow model capacity over time if the initial model is relatively small. Moreover, updating the model at an intermediate rate strikes the best trade off between accuracy and time to obtain a useful predictor.

قيم البحث

اقرأ أيضاً

87 - Guokun Chi , Min Jiang , Xing Gao 2019
Transfer learning techniques have been widely used in the reality that it is difficult to obtain sufficient labeled data in the target domain, but a large amount of auxiliary data can be obtained in the relevant source domain. But most of the existin g methods are based on offline data. In practical applications, it is often necessary to face online learning problems in which the data samples are achieved sequentially. In this paper, We are committed to applying the ensemble approach to solving the problem of online transfer learning so that it can be used in anytime setting. More specifically, we propose a novel online transfer learning framework, which applies the idea of online bagging methods to anytime transfer learning problems, and constructs strong classifiers through online iterations of the usefulness of multiple weak classifiers. Further, our algorithm also provides two extension schemes to reduce the impact of negative transfer. Experiments on three real data sets show that the effectiveness of our proposed algorithms.
102 - Nicolas Le Roux 2019
Tail averaging consists in averaging the last examples in a stream. Common techniques either have a memory requirement which grows with the number of samples to average, are not available at every timestep or do not accomodate growing windows. We pro pose two techniques with a low constant memory cost that perform tail averaging with access to the average at every time step. We also show how one can improve the accuracy of that average at the cost of increased memory consumption.
Diffusiophoresis, a ubiquitous phenomenon that induces particle transport whenever solute concentration gradients are present, was recently observed in the context of microsystems and shown to strongly impact colloidal transport (patterning and mixin g) at such scales. In the present work, we show experimentally that this nanoscale mechanism can induce changes in the macroscale mixing of colloids by chaotic advection. Rather than the decay of the standard deviation of concentration, which is a global parameter commonly employed in studies of mixing, we instead use multiscale tools adapted from studies of chaotic flows or intermittent turbulent mixing: concentration spectra and second and fourth moments of the probability density functions of scalar gradients. Not only can these tools be used in open flows, but they also allow for scale-by-scale analysis. Strikingly, diffusiophoresis is shown to affect all scales, although more particularly the small ones, resulting in a change of scalar intermittency and in an unusual scale bridging spanning more than seven orders of magnitude. By quantifying the averaged impact of diffusiophoresis on the macroscale mixing, we explain why the effects observed are consistent with the introduction of an effective Peclet number.
Neural networks notoriously suffer from the problem of catastrophic forgetting, the phenomenon of forgetting the past knowledge when acquiring new knowledge. Overcoming catastrophic forgetting is of significant importance to emulate the process of in cremental learning, where the model is capable of learning from sequential experience in an efficient and robust way. State-of-the-art techniques for incremental learning make use of knowledge distillation towards preventing catastrophic forgetting. Therein, one updates the network while ensuring that the networks responses to previously seen concepts remain stable throughout updates. This in practice is done by minimizing the dissimilarity between current and previous responses of the network one way or another. Our work contributes a novel method to the arsenal of distillation techniques. In contrast to the previous state of the art, we propose to firstly construct low-dimensional manifolds for previous and current responses and minimize the dissimilarity between the responses along the geodesic connecting the manifolds. This induces a more formidable knowledge distillation with smooth properties which preserves the past knowledge more efficiently as observed by our comprehensive empirical study.
Online continual learning (OCL) refers to the ability of a system to learn over time from a continuous stream of data without having to revisit previously encountered training samples. Learning continually in a single data pass is crucial for agents and robots operating in changing environments and required to acquire, fine-tune, and transfer increasingly complex representations from non-i.i.d. input distributions. Machine learning models that address OCL must alleviate textit{catastrophic forgetting} in which hidden representations are disrupted or completely overwritten when learning from streams of novel input. In this chapter, we summarize and discuss recent deep learning models that address OCL on sequential input through the use (and combination) of synaptic regularization, structural plasticity, and experience replay. Different implementations of replay have been proposed that alleviate catastrophic forgetting in connectionists architectures via the re-occurrence of (latent representations of) input sequences and that functionally resemble mechanisms of hippocampal replay in the mammalian brain. Empirical evidence shows that architectures endowed with experience replay typically outperform architectures without in (online) incremental learning tasks.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا