ترغب بنشر مسار تعليمي؟ اضغط هنا

Reconciling modern machine learning practice and the bias-variance trade-off

152   0   0.0 ( 0 )
 نشر من قبل Daniel Hsu
 تاريخ النشر 2018
والبحث باللغة English




اسأل ChatGPT حول البحث

Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This double descent curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.



قيم البحث

اقرأ أيضاً

Modern Monte Carlo-type approaches to dynamic decision problems are reformulated as empirical loss minimization, allowing direct applications of classical results from statistical machine learning. These computational methods are then analyzed in thi s framework to demonstrate their effectiveness as well as their susceptibility to generalization error. Standard uses of classical results prove potential overlearning, thus bias-variance trade-off, by connecting over-trained networks to anticipating controls. On the other hand, non-asymptotic estimates based on Rademacher complexity show the convergence of these algorithms for sufficiently large training sets. A numerically studied stylized example illustrates these possibilities, including the importance of problem dimension in the degree of overlearning, and the effectiveness of this approach.
Learning algorithms need bias to generalize and perform better than random guessing. We examine the flexibility (expressivity) of biased algorithms. An expressive algorithm can adapt to changing training data, altering its outcome based on changes in its input. We measure expressivity by using an information-theoretic notion of entropy on algorithm outcome distributions, demonstrating a trade-off between bias and expressivity. To the degree an algorithm is biased is the degree to which it can outperform uniform random sampling, but is also the degree to which is becomes inflexible. We derive bounds relating bias to expressivity, proving the necessary trade-offs inherent in trying to create strongly performing yet flexible algorithms.
84 - Ou Wu , Weiyao Zhu , Yingjun Deng 2021
A common assumption in machine learning is that samples are independently and identically distributed (i.i.d). However, the contributions of different samples are not identical in training. Some samples are difficult to learn and some samples are noi sy. The unequal contributions of samples has a considerable effect on training performances. Studies focusing on unequal sample contributions (e.g., easy, hard, noisy) in learning usually refer to these contributions as robust machine learning (RML). Weighing and regularization are two common techniques in RML. Numerous learning algorithms have been proposed but the strategies for dealing with easy/hard/noisy samples differ or even contradict with different learning algorithms. For example, some strategies take the hard samples first, whereas some strategies take easy first. Conducting a clear comparison for existing RML algorithms in dealing with different samples is difficult due to lack of a unified theoretical framework for RML. This study attempts to construct a mathematical foundation for RML based on the bias-variance trade-off theory. A series of definitions and properties are presented and proved. Several classical learning algorithms are also explained and compared. Improvements of existing methods are obtained based on the comparison. A unified method that combines two classical learning strategies is proposed.
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Ove rparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models -- from simple linear models to deep neural networks -- have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance. Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.
Intuitively, a scientist might assume that a more complex regression model will necessarily yield a better predictive model of experimental data. Herein, we disprove this notion in the context of extracting the proton charge radius from charge form f actor data. Using a Monte Carlo study, we show that a simpler regression model can in certain cases be the better predictive model. This is especially true with noisy data where the complex model will fit the noise instead of the physical signal. Thus, in order to select the appropriate regression model to employ, a clear technique should be used such as the Akaike information criterion or Bayesian information criterion, and ideally selected previous to seeing the results. Also, to ensure a reasonable fit, the scientist should also make regression quality plots, such as residual plots, and not just rely on a single criterion such as reduced chi2. When we apply these techniques to low four-momentum transfer cross section data, we find a proton radius that is consistent with the muonic Lamb shift results. While presented for the case of proton radius extraction, these concepts are applicable in general and can be used to illustrate the necessity of balancing bias and variance when building a regression model and validating results, ideas that are at the heart of modern machine learning algorithms.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا