No Arabic abstract
An on-line drilling system, the tutor-web, has been developed and used for teaching mathematics and statistics. The system was used in a basic course in calculus including 182 students. The students were requested to answer quiz questions in the tutor-web and therefore monitored continuously during the semester. Data available are grades on a status exam conducted in the beginning of the course, a final grade and data gathered in the tutor-web system. A classification of the students is proposed using the data gathered in the system; a Good student should be able to solve a problem quickly and get it right, the diligent hard-working Learner may take longer to get the right answer, a guessing (Poor) student will not take long to get the wrong answer and the remaining (Unclassified) apparent non-learning students take long to get the wrong answer, resulting in a simple classification GLUP. The (Poor) students were found to show the least improvement, defined as the change in grade from the status to the final exams, while the Learners were found to improve the most. The results are used to demonstrate how further experiments are needed and can be designed as well as to indicate how a system needs to be further developed to accommodate such experiments.
The Majorana Demonstrator is an experiment constructed to search for neutrinoless double-beta decays in germanium-76 and to demonstrate the feasibility to deploy a ton-scale experiment in a phased and modular fashion. It consists of two modular arrays of natural and 76Ge-enriched germanium p-type point contact detectors totaling 44.1 kg, located at the 4850 level of the Sanford Underground Research Facility in Lead, South Dakota, USA. The Demonstrator uses custom high voltage cables to bias the detectors, as well as custom signal cables and connectors to read out the charge deposited at the point contact of each detector. These low-mass cables and connectors must meet stringent radiopurity requirements while being subjected to thermal and mechanical stress. A number of issues have been identified with the currently installed cables and connectors. An improved set of cables and connectors for the Majorana Demonstrator are being developed with the aim of increasing their overall reliability and connectivity. We will discuss some of the issues encountered with the current cables and connectors as well as our improved designs and their initial performance.
Suppose we have a Bayesian model which combines evidence from several different sources. We want to know which model parameters most affect the estimate or decision from the model, or which of the parameter uncertainties drive the decision uncertainty. Furthermore we want to prioritise what further data should be collected. These questions can be addressed by Value of Information (VoI) analysis, in which we estimate expected reductions in loss from learning specific parameters or collecting data of a given design. We describe the theory and practice of VoI for Bayesian evidence synthesis, using and extending ideas from health economics, computer modelling and Bayesian design. The methods are general to a range of decision problems including point estimation and choices between discrete actions. We apply them to a model for estimating prevalence of HIV infection, combining indirect information from several surveys, registers and expert beliefs. This analysis shows which parameters contribute most of the uncertainty about each prevalence estimate, and provides the expected improvements in precision from collecting specific amounts of additional data.
Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.
Built environment features (BEFs) refer to aspects of the human constructed environment, which may in turn support or restrict health related behaviors and thus impact health. In this paper we are interested in understanding whether the spatial distribution and quantity of fast food restaurants (FFRs) influence the risk of obesity in schoolchildren. To achieve this goal, we propose a two-stage Bayesian hierarchical modeling framework. In the first stage, examining the position of FFRs relative to that of some reference locations - in our case, schools - we model the distances of FFRs from these reference locations as realizations of Inhomogenous Poisson processes (IPP). With the goal of identifying representative spatial patterns of exposure to FFRs, we model the intensity functions of the IPPs using a Bayesian non-parametric viewpoint and specifying a Nested Dirichlet Process prior. The second stage model relates exposure patterns to obesity, offering two different approaches to accommodate uncertainty in the exposure patterns estimated in the first stage: in the first approach the odds of obesity at the school level is regressed on cluster indicators, each representing a major pattern of exposure to FFRs. In the second, we employ Bayesian Kernel Machine regression to relate the odds of obesity to the multivariate vector reporting the degree of similarity of a given school to all other schools. Our analysis on the influence of patterns of FFR occurrence on obesity among Californian schoolchildren has indicated that, in 2010, among schools that are consistently assigned to a cluster, there is a lower odds of obesity amongst 9th graders who attend schools with most distant FFR occurrences in a 1-mile radius as compared to others.
Large graphs are natural mathematical models for describing the structure of the data in a wide variety of fields, such as web mining, social networks, information retrieval, biological networks, etc. For all these applications, automatic tools are required to get a synthetic view of the graph and to reach a good understanding of the underlying problem. In particular, discovering groups of tightly connected vertices and understanding the relations between those groups is very important in practice. This paper shows how a kernel version of the batch Self Organizing Map can be used to achieve these goals via kernels derived from the Laplacian matrix of the graph, especially when it is used in conjunction with more classical methods based on the spectral analysis of the graph. The proposed method is used to explore the structure of a medieval social network modeled through a weighted graph that has been directly built from a large corpus of agrarian contracts.