Do you want to publish a course? Click here

Litescale: A Lightweight Tool for Best-worst Scaling Annotation

Litescale: أداة خفيفة الوزن للحصول على شرح أفضل أسوأ

221   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Best-worst Scaling (BWS) is a methodology for annotation based on comparing and ranking instances, rather than classifying or scoring individual instances. Studies have shown the efficacy of this methodology applied to NLP tasks in terms of a higher quality of the datasets produced by following it. In this system demonstration paper, we present Litescale, a free software library to create and manage BWS annotation tasks. Litescale computes the tuples to annotate, manages the users and the annotation process, and creates the final gold standard. The functionalities of Litescale can be accessed programmatically through a Python module, or via two alternative user interfaces, a textual console-based one and a graphical Web-based one. We further developed and deployed a fully online version of Litescale complete with multi-user support.



References used
https://aclanthology.org/
rate research

Read More

Human language encompasses more than just text; it also conveys emotions through tone and gestures. We present a case study of three simple and efficient Transformer-based architectures for predicting sentiment and emotion in multimodal data. The Lat e Fusion model merges unimodal features to create a multimodal feature sequence, the Round Robin model iteratively combines bimodal features using cross-modal attention, and the Hybrid Fusion model combines trimodal and unimodal features together to form a final feature sequence for predicting sentiment. Our experiments show that our small models are effective and outperform the publicly released versions of much larger, state-of-the-art multimodal sentiment analysis systems.
Being able to generate accurate word alignments is useful for a variety of tasks. While statistical word aligners can work well, especially when parallel training data are plentiful, multilingual embedding models have recently been shown to give good results in unsupervised scenarios. We evaluate an ensemble method for word alignment on four language pairs and demonstrate that by combining multiple tools, taking advantage of their different approaches, substantial gains can be made. This holds for settings ranging from very low-resource to high-resource. Furthermore, we introduce a new gold alignment test set for Icelandic and a new easy-to-use tool for creating manual word alignments.
Recent knowledge graph embedding (KGE) models based on hyperbolic geometry have shown great potential in a low-dimensional embedding space. However, the necessity of hyperbolic space in KGE is still questionable, because the calculation based on hype rbolic geometry is much more complicated than Euclidean operations. In this paper, based on the state-of-the-art hyperbolic-based model RotH, we develop two lightweight Euclidean-based models, called RotL and Rot2L. The RotL model simplifies the hyperbolic operations while keeping the flexible normalization effect. Utilizing a novel two-layer stacked transformation and based on RotL, the Rot2L model obtains an improved representation capability, yet costs fewer parameters and calculations than RotH. The experiments on link prediction show that Rot2L achieves the state-of-the-art performance on two widely-used datasets in low-dimensional knowledge graph embeddings. Furthermore, RotL achieves similar performance as RotH but only requires half of the training time.
The current natural language processing is strongly focused on raising accuracy. The progress comes at a cost of super-heavy models with hundreds of millions or even billions of parameters. However, simple syntactic tasks such as part-of-speech (POS) tagging, dependency parsing or named entity recognition (NER) do not require the largest models to achieve acceptable results. In line with this assumption we try to minimize the size of the model that jointly performs all three tasks. We introduce ComboNER: a lightweight tool, orders of magnitude smaller than state-of-the-art transformers. It is based on pre-trained subword embeddings and recurrent neural network architecture. ComboNER operates on Polish language data. The model has outputs for POS tagging, dependency parsing and NER. Our paper contains some insights from fine-tuning of the model and reports its overall results.
Large pre-trained language models such as BERT have been the driving force behind recent improvements across many NLP tasks. However, BERT is only trained to predict missing words -- either through masking or next sentence prediction -- and has no kn owledge of lexical, syntactic or semantic information beyond what it picks up through unsupervised pre-training. We propose a novel method to explicitly inject linguistic information in the form of word embeddings into any layer of a pre-trained BERT. When injecting counter-fitted and dependency-based embeddings, the performance improvements on multiple semantic similarity datasets indicate that such information is beneficial and currently missing from the original model. Our qualitative analysis shows that counter-fitted embedding injection is particularly beneficial, with notable improvements on examples that require synonym resolution.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا