Do you want to publish a course? Click here

In this paper, we present the NICT system (NICT-2) submitted to the NICT-SAP shared task at the 8th Workshop on Asian Translation (WAT-2021). A feature of our system is that we used a pretrained multilingual BART (Bidirectional and Auto-Regressive Tr ansformer; mBART) model. Because publicly available models do not support some languages in the NICT-SAP task, we added these languages to the mBART model and then trained it using monolingual corpora extracted from Wikipedia. We fine-tuned the expanded mBART model using the parallel corpora specified by the NICT-SAP task. The BLEU scores greatly improved in comparison with those of systems without the pretrained model, including the additional languages.
In this paper, we develop Sindhi subjective lexicon using a merger of existing English resources: NRC lexicon, list of opinion words, SentiWordNet, Sindhi-English bilingual dictionary, and collection of Sindhi modifiers. The positive or negative sent iment score is assigned to each Sindhi opinion word. Afterwards, we determine the coverage of the proposed lexicon with subjectivity analysis. Moreover, we crawl multi-domain tweet corpus of news, sports, and finance. The crawled corpus is annotated by experienced annotators using the Doccano text annotation tool. The sentiment annotated corpus is evaluated by employing support vector machine (SVM), recurrent neural network (RNN) variants, and convolutional neural network (CNN).
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا