The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in Data Science. The group consisted of 25 undergraduate faculty from a variety of institutions in the U.S., primarily from the disciplines of mathematics, statistics and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in Data Science.
We present a programmatic approach to incorporating ethics into an undergraduate major in statistical and data sciences. We discuss departmental-level initiatives designed to meet the National Academy of Sciences recommendation for integrating ethics into the curriculum from top-to-bottom as our majors progress from our introductory courses to our senior capstone course, as well as from side-to-side through co-curricular programming. We also provide six examples of data science ethics modules used in five different courses at our liberal arts college, each focusing on a different ethical consideration. The modules are designed to be portable such that they can be flexibly incorporated into existing courses at different levels of instruction with minimal disruption to syllabi. We present assessments of our efforts and conclude with next steps and final thoughts.
The field of data science currently enjoys a broad definition that includes a wide array of activities which borrow from many other established fields of study. Having such a vague characterization of a field in the early stages might be natural, but over time maintaining such a broad definition becomes unwieldy and impedes progress. In particular, the teaching of data science is hampered by the seeming need to cover many different points of interest. Data scientists must ultimately identify the core of the field by determining what makes the field unique and what it means to develop new knowledge in data science. In this review we attempt to distill some core ideas from data science by focusing on the iterative process of data analysis and develop some generalizations from past experience. Generalizations of this nature could form the basis of a theory of data science and would serve to unify and scale the teaching of data science to large audiences.
Donohos JCGS (in press) paper is a spirited call to action for statisticians, who he points out are losing ground in the field of data science by refusing to accept that data science is its own domain. (Or, at least, a domain that is becoming distinctly defined.) He calls on writings by John Tukey, Bill Cleveland, and Leo Breiman, among others, to remind us that statisticians have been dealing with data science for years, and encourages acceptance of the direction of the field while also ensuring that statistics is tightly integrated. As faculty at baccalaureate institutions (where the growth of undergraduate statistics programs has been dramatic), we are keen to ensure statistics has a place in data science and data science education. In his paper, Donoho is primarily focused on graduate education. At our undergraduate institutions, we are considering many of the same questions.
With the advances in tools and the rise of popularity, Bayesian statistics is becoming more important for undergraduates. In this study, we surveyed whether an undergraduate Bayesian course is offered or not in our sample of 152 high-ranking research universities and liberal arts colleges. For each identified Bayesian course, we examined how it fits into the institutions undergraduate curricula, such as majors and prerequisites. Through a series of course syllabi analyses, we explored the topics covered and their popularity in these courses, the adopted teaching and learning tools, such as software. This paper presents our findings on the current practices of Bayesian education at the undergraduate level. Based on our findings, we provide recommendations for programs that may consider offering Bayesian education to their students.
A learning environment, the tutor-web (, has been developed and used for educational research. The system is accessible and free to use for anyone having access to the Web. It is based on open source software and the teaching material is licensed under the Creative Commons Attribution-ShareAlike License. The system has been used for computer-assisted education in statistics and mathematics. It offers a unique way to structure and link together teaching material and includes interactive quizzes with the primary purpose of increasing learning rather than mere evaluation. The system was used in a course on basic statistics in the University of Iceland, spring 2013. A randomized trial was conducted to investigate the difference in learning between students doing regular homework and students using the system. The difference between the groups was not found to be significant.