No Arabic abstract
When students write programs, their program structure provides insight into their learning process. However, analyzing program structure by hand is time-consuming, and teachers need better tools for computer-assisted exploration of student solutions. As a first step towards an education-oriented program analysis toolkit, we show how supervised machine learning methods can automatically classify student programs into a predetermined set of high-level structures. We evaluate two models on classifying student solutions to the Rainfall problem: a nearest-neighbors classifier using syntax tree edit distance and a recurrent neural network. We demonstrate that these models can achieve 91% classification accuracy when trained on 108 programs. We further explore the generality, trade-offs, and failure cases of each model.
With the recent implementation of the K to 12 Program, academic institutions, specifically, Colleges and Universities in the Philippines have been faced with difficulties in determining projected freshmen enrollees vis-a-vis decision-making factors for efficient resource management. Enrollment targets directly impacts success factors of Higher Education Institutions. This study covered an analysis of various characteristics of freshmen applicants affecting their admission status in a Philippine university. A predictive model was developed using Logistic Regression to evaluate the probability that an admitted student will pursue to enroll in the Institution or not. The dataset used was acquired from the University Admissions Office. The office designed an online application form to capture applicants details. The online form was distributed to all student applicants, and most often, students, tend to provide incomplete information. Despite this fact, student characteristics, as well as geographic and demographic data based on the students location are significant predictors of enrollment decision. The results of the study show that given limited information about prospective students, Higher Education Institutions can implement machine learning techniques to supplement management decisions and provide estimates of class sizes, in this way, it will allow the institution to optimize the allocation of resources and will have better control over net tuition revenue.
Algorithmic discrimination is an important aspect when data is used for predictive purposes. This paper analyzes the relationships between discrimination and classification, data set partitioning, and decision models, as well as correlation. The paper uses real world data sets to demonstrate the existence of discrimination and the independence between the discrimination of data sets and the discrimination of classification models.
India accounts for 11% of maternal deaths globally where a woman dies in childbirth every fifteen minutes. Lack of access to preventive care information is a significant problem contributing to high maternal morbidity and mortality numbers, especially in low-income households. We work with ARMMAN, a non-profit based in India, to further the use of call-based information programs by early-on identifying women who might not engage on these programs that are proven to affect health parameters positively.We analyzed anonymized call-records of over 300,000 women registered in an awareness program created by ARMMAN that uses cellphone calls to regularly disseminate health related information. We built robust deep learning based models to predict short term and long term dropout risk from call logs and beneficiaries demographic information. Our model performs 13% better than competitive baselines for short-term forecasting and 7% better for long term forecasting. We also discuss the applicability of this method in the real world through a pilot validation that uses our method to perform targeted interventions.
Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence. A potential solution to this growing global problem is citizen-generated counter speech where citizens actively engage in hate-filled conversations to attempt to restore civil non-polarized discourse. However, its actual effectiveness in curbing the spread of hatred is unknown and hard to quantify. One major obstacle to researching this question is a lack of large labeled data sets for training automated classifiers to identify counter speech. Here we made use of a unique situation in Germany where self-labeling groups engaged in organized online hate and counter speech. We used an ensemble learning algorithm which pairs a variety of paragraph embeddings with regularized logistic regression functions to classify both hate and counter speech in a corpus of millions of relevant tweets from these two groups. Our pipeline achieved macro F1 scores on out of sample balanced test sets ranging from 0.76 to 0.97---accuracy in line and even exceeding the state of the art. On thousands of tweets, we used crowdsourcing to verify that the judgments made by the classifier are in close alignment with human judgment. We then used the classifier to discover hate and counter speech in more than 135,000 fully-resolved Twitter conversations occurring from 2013 to 2018 and study their frequency and interaction. Altogether, our results highlight the potential of automated methods to evaluate the impact of coordinated counter speech in stabilizing conversations on social media.
Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process. Key insights: * Automation in data science aims to facilitate and transform the work of data scientists, not to replace them. * Important parts of data science are already being automated, especially in the modeling stages, where techniques such as automated machine learning (AutoML) are gaining traction. * Other aspects are harder to automate, not only because of technological challenges, but because open-ended and context-dependent tasks require human interaction.