Behavior of Modern Pre-trained Language Models Using the Example of Probing Tasks


Abstract in English

Modern transformer-based language models are revolutionizing NLP. However, existing studies into language modelling with BERT have been mostly limited to English-language material and do not pay enough attention to the implicit knowledge of language, such as semantic roles, presupposition and negations, that can be acquired by the model during training. Thus, the aim of this study is to examine behavior of the model BERT in the task of masked language modelling and to provide linguistic interpretation to the unexpected effects and errors produced by the model. For this purpose, we used a new Russian-language dataset based on educational texts for learners of Russian and annotated with the help of the National Corpus of the Russian language. In terms of quality metrics (the proportion of words, semantically related to the target word), the multilingual BERT is recognized as the best model. Generally, each model has distinct strengths in relation to a certain linguistic phenomenon. These observations have meaningful implications for research into applied linguistics and pedagogy, contribute to dialogue system development, automatic exercise making, text generation and potentially could improve the quality of existing linguistic technologies

References used

https://aclanthology.org/

Download