No Arabic abstract
In educational applications, Knowledge Tracing (KT), the problem of accurately predicting students responses to future questions by summarizing their knowledge states, has been widely studied for decades as it is considered a fundamental task towards adaptive online learning. Among all the proposed KT methods, Deep Knowledge Tracing (DKT) and its variants are by far the most effective ones due to the high flexibility of the neural network. However, DKT often ignores the inherent differences between students (e.g. memory skills, reasoning skills, ...), averaging the performances of all students, leading to the lack of personalization, and therefore was considered insufficient for adaptive learning. To alleviate this problem, in this paper, we proposed Leveled Attentive KNowledge TrAcing (LANA), which firstly uses a novel student-related features extractor (SRFE) to distill students unique inherent properties from their respective interactive sequences. Secondly, the pivot module was utilized to dynamically reconstruct the decoder of the neural network on attention of the extracted features, successfully distinguishing the performance between students over time. Moreover, inspired by Item Response Theory (IRT), the interpretable Rasch model was used to cluster students by their ability levels, and thereby utilizing leveled learning to assign different encoders to different groups of students. With pivot module reconstructed the decoder for individual students and leveled learning specialized encoders for groups, personalized DKT was achieved. Extensive experiments conducted on two real-world large-scale datasets demonstrated that our proposed LANA improves the AUC score by at least 1.00% (i.e. EdNet 1.46% and RAIEd2020 1.00%), substantially surpassing the other State-Of-The-Art KT methods.
High-quality education is one of the keys to achieving a more sustainable world. The recent COVID-19 epidemic has triggered the outbreak of online education, which has enabled both students and teachers to learn and teach at home. Meanwhile, it is now possible to record and research a large amount of learning data using online learning platforms in order to offer better intelligent educational services. Knowledge Tracing (KT), which aims to monitor students evolving knowledge state, is a fundamental and crucial task to support these intelligent services. Therefore, an increasing amount of research attention has been paid to this emerging area and considerable progress has been made. In this survey, we propose a new taxonomy of existing basic KT models from a technical perspective and provide a comprehensive overview of these models in a systematic manner. In addition, many variants of KT models have been proposed to capture more complete learning process. We then review these variants involved in three phases of the learning process: before, during, and after the student learning, respectively. Moreover, we present several typical applications of KT in different educational scenarios. Finally, we provide some potential directions for future research in this fast-growing field.
The most important task in personalized news recommendation is accurate matching between candidate news and user interest. Most of existing news recommendation methods model candidate news from its textual content and user interest from their clicked news in an independent way. However, a news article may cover multiple aspects and entities, and a user usually has different kinds of interest. Independent modeling of candidate news and user interest may lead to inferior matching between news and users. In this paper, we propose a knowledge-aware interactive matching method for news recommendation. Our method interactively models candidate news and user interest to facilitate their accurate matching. We design a knowledge-aware news co-encoder to interactively learn representations for both clicked news and candidate news by capturing their relatedness in both semantic and entities with the help of knowledge graphs. We also design a user-news co-encoder to learn candidate news-aware user interest representation and user-aware candidate news representation for better interest matching. Experiments on two real-world datasets validate that our method can effectively improve the performance of news recommendation.
Knowledge tracing, the act of modeling a students knowledge through learning activities, is an extensively studied problem in the field of computer-aided education. Although models with attention mechanism have outperformed traditional approaches such as Bayesian knowledge tracing and collaborative filtering, they share two limitations. Firstly, the models rely on shallow attention layers and fail to capture complex relations among exercises and responses over time. Secondly, different combinations of queries, keys and values for the self-attention layer for knowledge tracing were not extensively explored. Usual practice of using exercises and interactions (exercise-response pairs) as queries and keys/values respectively lacks empirical support. In this paper, we propose a novel Transformer based model for knowledge tracing, SAINT: Separated Self-AttentIve Neural Knowledge Tracing. SAINT has an encoder-decoder structure where exercise and response embedding sequence separately enter the encoder and the decoder respectively, which allows to stack attention layers multiple times. To the best of our knowledge, this is the first work to suggest an encoder-decoder model for knowledge tracing that applies deep self-attentive layers to exercises and responses separately. The empirical evaluations on a large-scale knowledge tracing dataset show that SAINT achieves the state-of-the-art performance in knowledge tracing with the improvement of AUC by 1.8% compared to the current state-of-the-art models.
As artificial intelligence (AI)-empowered applications become widespread, there is growing awareness and concern for user privacy and data confidentiality. This has contributed to the popularity of federated learning (FL). FL applications often face data distribution and device capability heterogeneity across data owners. This has stimulated the rapid development of Personalized FL (PFL). In this paper, we complement existing surveys, which largely focus on the methods and applications of FL, with a review of recent advances in PFL. We discuss hurdles to PFL under the current FL settings, and present a unique taxonomy dividing PFL techniques into data-based and model-based approaches. We highlight their key ideas, and envision promising future trajectories of research towards new PFL architectural design, realistic PFL benchmarking, and trustworthy PFL approaches.
The needs for precisely estimating a students academic performance have been emphasized with an increasing amount of attention paid to Intelligent Tutoring System (ITS). However, since labels for academic performance, such as test scores, are collected from outside of ITS, obtaining the labels is costly, leading to label-scarcity problem which brings challenge in taking machine learning approaches for academic performance prediction. To this end, inspired by the recent advancement of pre-training method in natural language processing community, we propose DPA, a transfer learning framework with Discriminative Pre-training tasks for Academic performance prediction. DPA pre-trains two models, a generator and a discriminator, and fine-tunes the discriminator on academic performance prediction. In DPAs pre-training phase, a sequence of interactions where some tokens are masked is provided to the generator which is trained to reconstruct the original sequence. Then, the discriminator takes an interaction sequence where the masked tokens are replaced by the generators outputs, and is trained to predict the originalities of all tokens in the sequence. Compared to the previous state-of-the-art generative pre-training method, DPA is more sample efficient, leading to fast convergence to lower academic performance prediction error. We conduct extensive experimental studies on a real-world dataset obtained from a multi-platform ITS application and show that DPA outperforms the previous state-of-the-art generative pre-training method with a reduction of 4.05% in mean absolute error and more robust to increased label-scarcity.