ترغب بنشر مسار تعليمي؟ اضغط هنا

Setting the stage for data science: integration of data management skills in introductory and second courses in statistics

96   0   0.0 ( 0 )
 نشر من قبل Nicholas Horton
 تاريخ النشر 2015
والبحث باللغة English




اسأل ChatGPT حول البحث

Many have argued that statistics students need additional facility to express statistical computations. By introducing students to commonplace tools for data management, visualization, and reproducible analysis in data science and applying these to real-world scenarios, we prepare them to think statistically. In an era of increasingly big data, it is imperative that students develop data-related capacities, beginning with the introductory course. We believe that the integration of these precursors to data science into our curricula-early and often-will help statisticians be part of the dialogue regarding Big Data and Big Questions.



قيم البحث

اقرأ أيضاً

The intrinsic temporality of learning demands the adoption of methodologies capable of exploiting time-series information. In this study we leverage the sequence data framework and show how data-driven analysis of temporal sequences of task completio n in online courses can be used to characterise personal and group learners behaviors, and to identify critical tasks and course sessions in a given course design. We also introduce a recently developed probabilistic Bayesian model to learn sequence trajectories of students and predict student performance. The application of our data-driven sequence-based analyses to data from learners undertaking an on-line Business Management course reveals distinct behaviors within the cohort of learners, identifying learners or groups of learners that deviate from the nominal order expected in the course. Using course grades a posteriori, we explore differences in behavior between high and low performing learners. We find that high performing learners follow the progression between weekly sessions more regularly than low performing learners, yet within each weekly session high performing learners are less tied to the nominal task order. We then model the sequences of high and low performance students using the probablistic Bayesian model and show that we can learn engagement behaviors associated with performance. We also show that the data sequence framework can be used for task centric analysis; we identify critical junctures and differences among types of tasks within the course design. We find that non-rote learning tasks, such as interactive tasks or discussion posts, are correlated with higher performance. We discuss the application of such analytical techniques as an aid to course design, intervention, and student supervision.
The use of machine learning (ML) in high-stakes societal decisions has encouraged the consideration of fairness throughout the ML lifecycle. Although data integration is one of the primary steps to generate high quality training data, most of the fai rness literature ignores this stage. In this work, we consider fairness in the integration component of data management, aiming to identify features that improve prediction without adding any bias to the dataset. We work under the causal interventional fairness paradigm. Without requiring the underlying structural causal model a priori, we propose an approach to identify a sub-collection of features that ensure the fairness of the dataset by performing conditional independence tests between different subsets of features. We use group testing to improve the complexity of the approach. We theoretically prove the correctness of the proposed algorithm to identify features that ensure interventional fairness and show that sub-linear conditional independence tests are sufficient to identify these variables. A detailed empirical evaluation is performed on real-world datasets to demonstrate the efficacy and efficiency of our technique.
86 - J-M. Le Goff 1998
At a time when many companies are under pressure to reduce times-to-market the management of product information from the early stages of design through assembly to manufacture and production has become increasingly important. Similarly in the constr uction of high energy physics devices the collection of (often evolving) engineering data is central to the subsequent physics analysis. Traditionally in industry design engineers have employed Engineering Data Management Systems (also called Product Data Management Systems) to coordinate and control access to document
In network analysis, many community detection algorithms have been developed, however, their implementation leaves unaddressed the question of the statistical validation of the results. Here we present robin(ROBustness In Network), an R package to as sess the robustness of the community structure of a network found by one or more methods to give indications about their reliability. The procedure initially detects if the community structure found by a set of algorithms is statistically significant and then compares two selected detection algorithms on the same graph to choose the one that better fits the network of interest. We demonstrate the use of our package on the American College Football benchmark dataset.
Cloud platform came into existence primarily to accelerate IT delivery and to promote innovation. To this point, it has performed largely well to the expectations of technologists, businesses and customers. The service aspect of this technology has p aved the road for a faster set up of infrastructure and related goals for both startups and established organizations. This has further led to quicker delivery of many user-friendly applications to the market while proving to be a commercially viable option to companies with limited resources. On the technology front, the creation and adoption of this ecosystem has allowed easy collection of massive data from various sources at one place, where the place is sometimes referred as just the cloud. Efficient data mining can be performed on raw data to extract potentially useful information, which was not possible at this scale before. Targeted advertising is a common example that can help businesses. Despite these promising offerings, concerns around security and privacy of user information suppressed wider acceptance and an all-encompassing deployment of the cloud platform. In this paper, we discuss security and privacy concerns that occur due to data exchanging hands between a cloud servicer provider (CSP) and the primary cloud user - the data collector, from the content generator. We offer solutions that encompass technology, policy and sound management of the cloud service asserting that this approach has the potential to provide a holistic solution.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا