No Arabic abstract
Students feedback is an important source of collecting students opinions to improve the quality of training activities. Implementing sentiment analysis into student feedback data, we can determine sentiments polarities which express all problems in the institution since changes necessary will be applied to improve the quality of teaching and learning. This study focused on machine learning and natural language processing techniques (NaiveBayes, Maximum Entropy, Long Short-Term Memory, Bi-Directional Long Short-Term Memory) on the VietnameseStudents Feedback Corpus collected from a university. The final results were compared and evaluated to find the most effective model based on different evaluation criteria. The experimental results show that the Bi-Directional LongShort-Term Memory algorithm outperformed than three other algorithms in terms of the F1-score measurement with 92.0% on the sentiment classification task and 89.6% on the topic classification task. In addition, we developed a sentiment analysis application analyzing student feedback. The application will help the institution to recognize students opinions about a problem and identify shortcomings that still exist. With the use of this application, the institution can propose an appropriate method to improve the quality of training activities in the future.
Group-fairness in classification aims for equality of a predictive utility across different sensitive sub-populations, e.g., race or gender. Equality or near-equality constraints in group-fairness often worsen not only the aggregate utility but also the utility for the least advantaged sub-population. In this paper, we apply the principles of Pareto-efficiency and least-difference to the utility being accuracy, as an illustrative example, and arrive at the Rawls classifier that minimizes the error rate on the worst-off sensitive sub-population. Our mathematical characterization shows that the Rawls classifier uniformly applies a threshold to an ideal score of features, in the spirit of fair equality of opportunity. In practice, such a score or a feature representation is often computed by a black-box model that has been useful but unfair. Our second contribution is practical Rawlsian fair adaptation of any given black-box deep learning model, without changing the score or feature representation it computes. Given any score function or feature representation and only its second-order statistics on the sensitive sub-populations, we seek a threshold classifier on the given score or a linear threshold classifier on the given feature representation that achieves the Rawls error rate restricted to this hypothesis class. Our technical contribution is to formulate the above problems using ambiguous chance constraints, and to provide efficient algorithms for Rawlsian fair adaptation, along with provable upper bounds on the Rawls error rate. Our empirical results show significant improvement over state-of-the-art group-fair algorithms, even without retraining for fairness.
This paper proposed several transformer-based approaches for Reliable Intelligence Identification on Vietnamese social network sites at VLSP 2020 evaluation campaign. We exploit both of monolingual and multilingual pre-trained models. Besides, we utilize the ensemble method to improve the robustness of different approaches. Our team achieved a score of 0.9378 at ROC-AUC metric in the private test set which is competitive to other participants.
Large-scale and high-quality corpora are necessary for evaluating machine reading comprehension models on a low-resource language like Vietnamese. Besides, machine reading comprehension (MRC) for the health domain offers great potential for practical applications; however, there is still very little MRC research in this domain. This paper presents ViNewsQA as a new corpus for the Vietnamese language to evaluate healthcare reading comprehension models. The corpus comprises 22,057 human-generated question-answer pairs. Crowd-workers create the questions and their answers based on a collection of over 4,416 online Vietnamese healthcare news articles, where the answers comprise spans extracted from the corresponding articles. In particular, we develop a process of creating a corpus for the Vietnamese machine reading comprehension. Comprehensive evaluations demonstrate that our corpus requires abilities beyond simple reasoning, such as word matching and demanding difficult reasoning based on single-or-multiple-sentence information. We conduct experiments using different types of machine reading comprehension methods to achieve the first baseline performances, compared with further models performances. We also measure human performance on the corpus and compared it with several powerful neural network-based and transfer learning-based models. Our experiments show that the best machine model is ALBERT, which achieves an exact match score of 65.26% and an F1-score of 84.89% on our corpus. The significant differences between humans and the best-performance model (14.53% of EM and 10.90% of F1-score) on the test set of our corpus indicate that improvements in ViNewsQA could be explored in the future study. Our corpus is publicly available on our website for the research purpose to encourage the research community to make these improvements.
Style transfer has been widely explored in natural language generation with non-parallel corpus by directly or indirectly extracting a notion of style from source and target domain corpus. A common shortcoming of existing approaches is the prerequisite of joint annotations across all the stylistic dimensions under consideration. Availability of such dataset across a combination of styles limits the extension of these setups to multiple style dimensions. While cascading single-dimensional models across multiple styles is a possibility, it suffers from content loss, especially when the style dimensions are not completely independent of each other. In our work, we relax this requirement of jointly annotated data across multiple styles by using independently acquired data across different style dimensions without any additional annotations. We initialize an encoder-decoder setup with transformer-based language model pre-trained on a generic corpus and enhance its re-writing capability to multiple target style dimensions by employing multiple style-aware language models as discriminators. Through quantitative and qualitative evaluation, we show the ability of our model to control styles across multiple style dimensions while preserving content of the input text. We compare it against baselines involving cascaded state-of-the-art uni-dimensional style transfer models.
During natural or man-made disasters, humanitarian response organizations look for useful information to support their decision-making processes. Social media platforms such as Twitter have been considered as a vital source of useful information for disaster response and management. Despite advances in natural language processing techniques, processing short and informal Twitter messages is a challenging task. In this paper, we propose to use Deep Neural Network (DNN) to address two types of information needs of response organizations: 1) identifying informative tweets and 2) classifying them into topical classes. DNNs use distributed representation of words and learn the representation as well as higher level features automatically for the classification task. We propose a new online algorithm based on stochastic gradient descent to train DNNs in an online fashion during disaster situations. We test our models using a crisis-related real-world Twitter dataset.