Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Comments on Leo Breimans paper Statistical Modeling: The Two Cultures (Statistical Science, 2001, 16(3), 199-231)

113 0 0.0 ( 0 )

Download Cite

Added by Jelena Bradic

Publication date 2021

fields Mathematical Statistics

and research's language is English

Authors Jelena Bradic - Yinchu Zhu

Machine Learning Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Breiman challenged statisticians to think more broadly, to step into the unknown, model-free learning world, with him paving the way forward. Statistics community responded with slight optimism, some skepticism, and plenty of disbelief. Today, we are at the same crossroad anew. Faced with the enormous practical success of model-free, deep, and machine learning, we are naturally inclined to think that everything is resolved. A new frontier has emerged; the one where the role, impact, or stability of the {it learning} algorithms is no longer measured by prediction quality, but an inferential one -- asking the questions of {it why} and {it if} can no longer be safely ignored.

rate research

Breimans two cultures: You dont have to choose sides

141 - Andrew C. Miller , Nicholas J. Foti , Emily B. Fox 2021

Breimans classic paper casts data analysis as a choice between two cultures: data modelers and algorithmic modelers. Stated broadly, data modelers use simple, interpretable models with well-understood theoretical properties to analyze data. Algorithmic modelers prioritize predictive accuracy and use more flexible function approximations to analyze data. This dichotomy overlooks a third set of models $-$ mechanistic models derived from scientific theories (e.g., ODE/SDE simulators). Mechanistic models encode application-specific scientific knowledge about the data. And while these categories represent extreme points in model space, modern computational and algorithmic tools enable us to interpolate between these points, producing flexible, interpretable, and scientifically-informed hybrids that can enjoy accurate and robust predictions, and resolve issues with data analysis that Breiman describes, such as the Rashomon effect and Occams dilemma. Challenges still remain in finding an appropriate point in model space, with many choices on how to compose model components and the degree to which each component informs inferences.

Machine Learning Machine Learning Methodology

Bridging Breimans Brook: From Algorithmic Modeling to Statistical Learning

70 - Lucas Mentch , Giles Hooker 2021

In 2001, Leo Breiman wrote of a divide between data modeling and algorithmic modeling cultures. Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries. We argue that this is largely due to the data modelers incorporating algorithmic methods into their toolbox, particularly driven by recent developments in the statistical understanding of Breimans own Random Forest methods. While this can be simplistically described as Breiman won, these same developments also expose the limitations of the prediction-first philosophy that he espoused, making careful statistical analysis all the more important. This paper outlines these exciting recent developments in the random forest literature which, in our view, occurred as a result of a necessary blending of the two ways of thinking Breiman originally described. We also ask what areas statistics and statisticians might currently overlook.

Other Statistics Machine Learning

Comments on Two Cultures: What have changed over 20 years?

178 - Xuming He , Jingshen Wang 2021

Twenty years ago Breiman (2001) called to our attention a significant cultural division in modeling and data analysis between the stochastic data models and the algorithmic models. Out of his deep concern that the statistical community was so deeply and almost exclusively committed to the former, Breiman warned that we were losing our abilities to solve many real-world problems. Breiman was not the first, and certainly not the only statistician, to sound the alarm; we may refer to none other than John Tukey who wrote almost 60 years ago data analysis is intrinsically an empirical science. However, the bluntness and timeliness of Breimans article made it uniquely influential. It prepared us for the data science era and encouraged a new generation of statisticians to embrace a more broadly defined discipline. Some might argue that The cultural division between these two statistical learning frameworks has been growing at a steady pace in recent years, to quote Mukhopadhyay and Wang (2020). In this commentary, we focus on some of the positive changes over the past 20 years and offer an optimistic outlook for our profession.

Other Statistics

On the Statistical Efficiency of Compositional Nonparametric Prediction

72 - Yixi Xu , Jean Honorio , Xiao Wang 2017

In this paper, we propose a compositional nonparametric method in which a model is expressed as a labeled binary tree of $2k+1$ nodes, where each node is either a summation, a multiplication, or the application of one of the $q$ basis functions to one of the $p$ covariates. We show that in order to recover a labeled binary tree from a given dataset, the sufficient number of samples is $O(klog(pq)+log(k!))$, and the necessary number of samples is $Omega(klog (pq)-log(k!))$. We further propose a greedy algorithm for regression in order to validate our theoretical findings through synthetic experiments.

Machine Learning Machine Learning

Statistical estimation for optimization problems on graphs

385 - Mikhail Langovoy , Suvrit Sra 2013

Large graphs abound in machine learning, data mining, and several related areas. A useful step towards analyzing such graphs is that of obtaining certain summary statistics - e.g., or the expected length of a shortest path between two nodes, or the expected weight of a minimum spanning tree of the graph, etc. These statistics provide insight into the structure of a graph, and they can help predict global properties of a graph. Motivated thus, we propose to study statistical properties of structured subgraphs (of a given graph), in particular, to estimate the expected objective function value of a combinatorial optimization problem over these subgraphs. The general task is very difficult, if not unsolvable; so for concreteness we describe a more specific statistical estimation problem based on spanning trees. We hope that our position paper encourages others to also study other types of graphical structures for which one can prove nontrivial statistical estimates.

Machine Learning Discrete Mathematics Optimization and Control

comments

Fetching comments

Hama University

Additional details More universities

Comments on Leo Breimans paper Statistical Modeling: The Two Cultures (Statistical Science, 2001, 16(3), 199-231)

Ask ChatGPT about the research

No Arabic abstract

Read More