ترغب بنشر مسار تعليمي؟ اضغط هنا

Bridging Breimans Brook: From Algorithmic Modeling to Statistical Learning

71   0   0.0 ( 0 )
 نشر من قبل Giles Hooker
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

In 2001, Leo Breiman wrote of a divide between data modeling and algorithmic modeling cultures. Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries. We argue that this is largely due to the data modelers incorporating algorithmic methods into their toolbox, particularly driven by recent developments in the statistical understanding of Breimans own Random Forest methods. While this can be simplistically described as Breiman won, these same developments also expose the limitations of the prediction-first philosophy that he espoused, making careful statistical analysis all the more important. This paper outlines these exciting recent developments in the random forest literature which, in our view, occurred as a result of a necessary blending of the two ways of thinking Breiman originally described. We also ask what areas statistics and statisticians might currently overlook.



قيم البحث

اقرأ أيضاً

Decision-making systems increasingly orchestrate our world: how to intervene on the algorithmic components to build fair and equitable systems is therefore a question of utmost importance; one that is substantially complicated by the context-dependen t nature of fairness and discrimination. Modern decision-making systems that involve allocating resources or information to people (e.g., school choice, advertising) incorporate machine-learned predictions in their pipelines, raising concerns about potential strategic behavior or constrained allocation, concerns usually tackled in the context of mechanism design. Although both machine learning and mechanism design have developed frameworks for addressing issues of fairness and equity, in some complex decision-making systems, neither framework is individually sufficient. In this paper, we develop the position that building fair decision-making systems requires overcoming these limitations which, we argue, are inherent to each field. Our ultimate objective is to build an encompassing framework that cohesively bridges the individual frameworks of mechanism design and machine learning. We begin to lay the ground work towards this goal by comparing the perspective each discipline takes on fair decision-making, teasing out the lessons each field has taught and can teach the other, and highlighting application domains that require a strong collaboration between these disciplines.
Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself. Consequently, we mu st search the space of programs for those that output the correct result, while not being misled by spurious programs: incorrect programs that coincidentally output the correct result. We connect two common learning paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML), and then present a new learning algorithm that combines the strengths of both. The new algorithm guards against spurious programs by combining the systematic search traditionally employed in MML with the randomized exploration of RL, and by updating parameters such that probability is spread more evenly across consistent programs. We apply our learning algorithm to a new neural semantic parser and show significant gains over existing state-of-the-art results on a recent context-dependent semantic parsing task.
112 - Jelena Bradic , Yinchu Zhu 2021
Breiman challenged statisticians to think more broadly, to step into the unknown, model-free learning world, with him paving the way forward. Statistics community responded with slight optimism, some skepticism, and plenty of disbelief. Today, we are at the same crossroad anew. Faced with the enormous practical success of model-free, deep, and machine learning, we are naturally inclined to think that everything is resolved. A new frontier has emerged; the one where the role, impact, or stability of the {it learning} algorithms is no longer measured by prediction quality, but an inferential one -- asking the questions of {it why} and {it if} can no longer be safely ignored.
The role of probability appears unchallenged as the key measure of uncertainty, used among other things for practical induction in the empirical sciences. Yet, Popper was emphatic in his rejection of inductive probability and of the logical probabili ty of hypotheses; furthermore, for him, the degree of corroboration cannot be a probability. Instead he proposed a deductive method of testing. In many ways this dialectic tension has many parallels in statistics, with the Bayesians on logico-inductive side vs the non-Bayesians or the frequentists on the other side. Simplistically Popper seems to be on the frequentist side, but recent synthesis on the non-Bayesian side might direct the Popperian views to a more nuanced destination. Logical probability seems perfectly suited to measure partial evidence or support, so what can we use if we are to reject it? For the past 100 years, statisticians have also developed a related concept called likelihood, which has played a central role in statistical modelling and inference. Remarkably, this Fisherian concept of uncertainty is largely unknown or at least severely under-appreciated in non-statistical literature. As a measure of corroboration, the likelihood satisfies the Popperian requirement that it is not a probability. Our aim is to introduce the likelihood and its recent extension via a discussion of two well-known logical fallacies in order to highlight that its lack of recognition may have led to unnecessary confusion in our discourse about falsification and corroboration of hypotheses. We highlight the 100 years of development of likelihood concepts. The year 2021 will mark the 100-year anniversary of the likelihood, so with this paper we wish it a long life and increased appreciation in non-statistical literature.
Breimans classic paper casts data analysis as a choice between two cultures: data modelers and algorithmic modelers. Stated broadly, data modelers use simple, interpretable models with well-understood theoretical properties to analyze data. Algorithm ic modelers prioritize predictive accuracy and use more flexible function approximations to analyze data. This dichotomy overlooks a third set of models $-$ mechanistic models derived from scientific theories (e.g., ODE/SDE simulators). Mechanistic models encode application-specific scientific knowledge about the data. And while these categories represent extreme points in model space, modern computational and algorithmic tools enable us to interpolate between these points, producing flexible, interpretable, and scientifically-informed hybrids that can enjoy accurate and robust predictions, and resolve issues with data analysis that Breiman describes, such as the Rashomon effect and Occams dilemma. Challenges still remain in finding an appropriate point in model space, with many choices on how to compose model components and the degree to which each component informs inferences.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا