Do you want to publish a course? Click here

The main idea of this solution has been to focus on corpus cleaning and preparation and after that, use an out of box solution (OpenNMT) with its default published transformer model. To prepare the corpus, we have used set of standard tools (as Moses scripts or python packages), but also, among other python scripts, a python custom tokenizer with the ability to replace numbers for variables, solve the upper/lower case issue of the vocabulary and provide good segmentation for most of the punctuation. We also have started a line to clean corpus based on statistical probability estimation of source-target corpus, with unclear results. Also, we have run some tests with syllabical word segmentation, again with unclear results, so at the end, after word sentence tokenization we have used BPE SentencePiece for subword units to feed OpenNMT.
This study describes the development of a Portuguese Community-Question Answering benchmark in the domain of Diabetes Mellitus using a Recognizing Question Entailment (RQE) approach. Given a premise question, RQE aims to retrieve semantically similar , already answered, archived questions. We build a new Portuguese benchmark corpus with 785 pairs between premise questions and archived answered questions marked with relevance judgments by medical experts. Based on the benchmark corpus, we leveraged and evaluated several RQE approaches ranging from traditional information retrieval methods to novel large pre-trained language models and ensemble techniques using learn-to-rank approaches. Our experimental results show that a supervised transformer-based method trained with multiple languages and for multiple tasks (MUSE) outperforms the alternatives. Our results also show that ensembles of methods (stacking) as well as a traditional (light) information retrieval method (BM25) can produce competitive results. Finally, among the tested strategies, those that exploit only the question (not the answer), provide the best effectiveness-efficiency trade-off. Code is publicly available.
Split-and-rephrase is a challenging task that promotes the transformation of a given complex input sentence into multiple shorter sentences retaining equivalent meaning. This rewriting approach conceptualizes that shorter sentences benefit human read ers and improve NLP downstream tasks attending as a preprocessing step. This work presents a complete pipeline capable of performing the split-and-rephrase method in a cross-lingual manner. We trained sequence-to-sequence neural models as from English corpora and applied them to predict the transformations in English and Brazilian Portuguese sentences jointly with BERT's masked language modeling. Contrary to traditional approaches that seek training models with extensive vocabularies, we present a non-trivial way to construct symbolic ones generalized solely by grammatical classes (POS tags) and their respective recurrences, reducing the amount of necessary training data. This pipeline contribution showed competitive results encouraging the expansion of the method to languages other than English.
In this paper, we describe the process of developing a multilayer semantic annotation scheme designed for extracting information from a European Portuguese corpus of news articles, at three levels, temporal, referential and semantic role labelling. T he novelty of this scheme is the harmonization of parts 1, 4 and 9 of the ISO 24617 Language resource management - Semantic annotation framework. This annotation framework includes a set of entity structures (participants, events, times) and a set of links (temporal, aspectual, subordination, objectal and semantic roles) with several tags and attribute values that ensure adequate semantic and visual representations of news stories.
Since the seminal work of Richard Montague in the 1970s, mathematical and logic tools have successfully been used to model several aspects of the meaning of natural language. However, visually impaired people continue to face serious difficulties in getting full access to this important instrument. Our paper aims to present a work in progress whose main goal is to provide blind students and researchers with an adequate method to deal with the different resources that are used in formal semantics. In particular, we intend to adapt the Portuguese Braille system in order to accommodate the most common symbols and formulas used in this kind of approach and to develop pedagogical procedures to facilitate its learnability. By making this formalization compatible with the Braille coding (either traditional and electronic), we hope to help blind people to learn and use this notation, essential to acquire a better understanding of a great number of semantic properties displayed by natural language.
This study examines the Portuguese occupation of the coast of Morocco Al-Aqsa , which is governed by a set of data, which is near Portugal , and earlier emergence of the State of Portugal and the pursuit of big projects in Morocco for a variety of reasons, accompanied by many of the campaigns that covered in the study , was thus one of the first countries that sought on sites in the Maghreb where he occupied the city of Ceuta in Morocco in 1415 AD ,the great and the marina from Algeria ,but the defeat in the battle of Valley stores disappointed hopes and made them prey to the enemy was Spain , which is seeking to extend its influence over Morocco .
This paper is concerned with the theme of divine justice in The Spanish Tragedy (1592) by Thomas Kyd (1558-1594), a famous English Elizabethan dramatist. It first defines the term "didactic", and then moves to discuss the Portuguese playlet in Th e Spanish Tragedy as a miniature play-within-the-play which reveals the issue of divine judgment. This paper concludes by asserting that this play is a tragedy of divine justice and punishment.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا