No Arabic abstract
The widespread adoption of online courses opens opportunities for the analysis of learner behaviour and for the optimisation of web-based material adapted to observed usage. Here we introduce a mathematical framework for the analysis of time series collected from online engagement of learners, which allows the identification of clusters of learners with similar online behaviour directly from the data, i.e., the groups of learners are not pre-determined subjectively but emerge algorithmically from the analysis and the data.The method uses a dynamic time warping kernel to create a pairwise similarity between time series of learner actions, and combines it with an unsupervised multiscale graph clustering algorithm to cluster groups of learners with similar patterns of behaviour. We showcase our approach on online engagement data of adult learners taking six web-based courses as part of a post-graduate degree at Imperial Business School. Our analysis identifies clusters of learners with statistically distinct patterns of engagement, ranging from distributed to massed learning, with different levels of adherence to pre-planned course structure and/or task completion, and also revealing outlier learners with highly sporadic behaviour. A posteriori comparison with performance showed that, although the majority of low-performing learners are part of in the massed learning cluster, the high performing learners are distributed across clusters with different traits of online engagement. We also show that our methodology is able to identify low performing learners more accurately than common classification methods based on raw statistics extracted from the data.
We discuss the problem of extending data mining approaches to cases in which data points arise in the form of individual graphs. Being able to find the intrinsic low-dimensionality in ensembles of graphs can be useful in a variety of modeling contexts, especially when coarse-graining the detailed graph information is of interest. One of the main challenges in mining graph data is the definition of a suitable pairwise similarity metric in the space of graphs. We explore two practical solutions to solving this problem: one based on finding subgraph densities, and one using spectral information. The approach is illustrated on three test data sets (ensembles of graphs); two of these are obtained from standard graph generating algorithms, while the graphs in the third example are sampled as dynamic snapshots from an evolving network simulation. We further incorporate these approaches with equation free techniques, demonstrating how such data mining approaches can enhance scientific computation of network evolution dynamics.
The intrinsic temporality of learning demands the adoption of methodologies capable of exploiting time-series information. In this study we leverage the sequence data framework and show how data-driven analysis of temporal sequences of task completion in online courses can be used to characterise personal and group learners behaviors, and to identify critical tasks and course sessions in a given course design. We also introduce a recently developed probabilistic Bayesian model to learn sequence trajectories of students and predict student performance. The application of our data-driven sequence-based analyses to data from learners undertaking an on-line Business Management course reveals distinct behaviors within the cohort of learners, identifying learners or groups of learners that deviate from the nominal order expected in the course. Using course grades a posteriori, we explore differences in behavior between high and low performing learners. We find that high performing learners follow the progression between weekly sessions more regularly than low performing learners, yet within each weekly session high performing learners are less tied to the nominal task order. We then model the sequences of high and low performance students using the probablistic Bayesian model and show that we can learn engagement behaviors associated with performance. We also show that the data sequence framework can be used for task centric analysis; we identify critical junctures and differences among types of tasks within the course design. We find that non-rote learning tasks, such as interactive tasks or discussion posts, are correlated with higher performance. We discuss the application of such analytical techniques as an aid to course design, intervention, and student supervision.
Recent advances in the synthesis of polar molecular materials have produced practical alternatives to ferroelectric ceramics, opening up exciting new avenues for their incorporation into modern electronic devices. However, in order to realize the full potential of polar polymer and molecular crystals for modern technological applications, it is paramount to assemble and evaluate all the available data for such compounds, identifying descriptors that could be associated with an emergence of ferroelectricity. In this work, we utilized data-driven approaches to judiciously shortlist candidate materials from a wide chemical space that could possess ferroelectric functionalities. An importance-sampling based method was utilized to address the challenge of having a limited amount of available data on already known organic ferroelectrics. Sets of molecular- and crystal-level descriptors were combined with a Random Forest Regression algorithm in order to predict spontaneous polarization of the shortlisted compounds with an average error of ~20%.
Joint clustering and feature learning methods have shown remarkable performance in unsupervised representation learning. However, the training schedule alternating between feature clustering and network parameters update leads to unstable learning of visual representations. To overcome this challenge, we propose Online Deep Clustering (ODC) that performs clustering and network update simultaneously rather than alternatingly. Our key insight is that the cluster centroids should evolve steadily in keeping the classifier stably updated. Specifically, we design and maintain two dynamic memory modules, i.e., samples memory to store samples labels and features, and centroids memory for centroids evolution. We break down the abrupt global clustering into steady memory update and batch-wise label re-assignment. The process is integrated into network update iterations. In this way, labels and the network evolve shoulder-to-shoulder rather than alternatingly. Extensive experiments demonstrate that ODC stabilizes the training process and boosts the performance effectively. Code: https://github.com/open-mmlab/OpenSelfSup.
In this paper, we identify a radically new viewpoint on the collective behaviour of groups of intelligent agents. We first develop a highly general abstract model for the possible future lives that these agents may encounter as a result of their decisions. In the context of these possible futures, we show that the causal entropic principle, whereby agents follow behavioural rules that maximise their entropy over all paths through the future, predicts many of the observed features of social interactions between individuals in both human and animal groups. Our results indicate that agents are often able to maximise their future path entropy by remaining cohesive as a group, and that this cohesion leads to collectively intelligent outcomes that depend strongly on the distribution of the number of future paths that are possible. We derive social interaction rules that are consistent with maximum-entropy group behaviour for both discrete and continuous decision spaces. Our analysis further predicts that social interactions are likely to be fundamentally based on Webers law of response to proportional stimuli, supporting many studies that find a neurological basis for this stimulus-response mechanism, and providing a novel basis for the common assumption of linearly additive social forces in simulation studies of collective behaviour.