Do you want to publish a course? Click here

Disambiguating Grammatical Number and Gender With BERT

disambigguating العدد النحوي والجنس مع بيرت

242   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Accurately dealing with any type of ambiguity is a major task in Natural Language Processing, with great advances recently reached due to the development of context dependent language models and the use of word or sentence embeddings. In this context, our work aimed at determining how the popular language representation model BERT handle ambiguity of nouns in grammatical number and gender in different languages. We show that models trained on one specific language achieve better results for the disambiguation process than multilingual models. Also, ambiguity is generally better dealt with in grammatical number than it is in grammatical gender, reaching greater distance values from one to another in direct comparisons of individual senses. The overall results show also that the amount of data needed for training monolingual models as well as application should not be underestimated.



References used
https://aclanthology.org/
rate research

Read More

Grammatical gender may be determined by semantics, orthography, phonology, or could even be arbitrary. Identifying patterns in the factors that govern noun genders can be useful for language learners, and for understanding innate linguistic sources o f gender bias. Traditional manual rule-based approaches may be substituted by more accurate and scalable but harder-to-interpret computational approaches for predicting gender from typological information. In this work, we propose interpretable gender classification models for French, which obtain the best of both worlds. We present high accuracy neural approaches which are augmented by a novel global surrogate based approach for explaining predictions. We introduce auxiliary attributes' to provide tunable explanation complexity.
Grammatical Error Correction (GEC) aims to correct writing errors and help language learners improve their writing skills. However, existing GEC models tend to produce spurious corrections or fail to detect lots of errors. The quality estimation mode l is necessary to ensure learners get accurate GEC results and avoid misleading from poorly corrected sentences. Well-trained GEC models can generate several high-quality hypotheses through decoding, such as beam search, which provide valuable GEC evidence and can be used to evaluate GEC quality. However, existing models neglect the possible GEC evidence from different hypotheses. This paper presents the Neural Verification Network (VERNet) for GEC quality estimation with multiple hypotheses. VERNet establishes interactions among hypotheses with a reasoning graph and conducts two kinds of attention mechanisms to propagate GEC evidence to verify the quality of generated hypotheses. Our experiments on four GEC datasets show that VERNet achieves state-of-the-art grammatical error detection performance, achieves the best quality estimation results, and significantly improves GEC performance by reranking hypotheses. All data and source codes are available at https://github.com/thunlp/VERNet.
We live in the era of his trademark is the use of figures and numbers in each of the significant aspects of life on the launch, but there are always difficulties face many people when reading the number correctly and properly, where some of them r esort to reading it colloquially, without being restricted by controls, and some others avoid the number as much as he, so he resorts to the language words that indicative of the number. For this, in this research, which we are dealing with, we are trying to shed light on some of the special issues of number, and the related rules and regulations and extensions, trying to focus on the points of difference; for weighting what we're saying more convincing and logical, as we are trying to differentiate between the figure and number, and said the simple numbers and numbers grades at each of the philosophy, mathematics, grammar and language scientists, to link these sciences together.
Passage retrieval and ranking is a key task in open-domain question answering and information retrieval. Current effective approaches mostly rely on pre-trained deep language model-based retrievers and rankers. These methods have been shown to effect ively model the semantic matching between queries and passages, also in presence of keyword mismatch, i.e. passages that are relevant to a query but do not contain important query keywords. In this paper we consider the Dense Retriever (DR), a passage retrieval method, and the BERT re-ranker, a popular passage re-ranking method. In this context, we formally investigate how these models respond and adapt to a specific type of keyword mismatch -- that caused by keyword typos occurring in queries. Through empirical investigation, we find that typos can lead to a significant drop in retrieval and ranking effectiveness. We then propose a simple typos-aware training framework for DR and BERT re-ranker to address this issue. Our experimental results on the MS MARCO passage ranking dataset show that, with our proposed typos-aware training, DR and BERT re-ranker can become robust to typos in queries, resulting in significantly improved effectiveness compared to models trained without appropriately accounting for typos.
In today's society, the rapid development of communication technology allows us to communicate with people from different parts of the world. In the process of communication, each person treats others differently. Some people are used to using offens ive and sarcastic language to express their views. These words cause pain to others and make people feel down. Some people are used to sharing happiness with others and encouraging others. Such people bring joy and hope to others through their words. On social media platforms, these two kinds of language are all over the place. If people want to make the online world a better place, they will have to deal with both. So identifying offensive language and hope language is an essential task. There have been many assignments about offensive language. Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI 2021-EACL 2021 uses another unique perspective -- to identify the language of Hope to make contributions to society. The XLM-Roberta model is an excellent multilingual model. Our team used a fine-tuned XLM-Roberta model to accomplish this task.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا