كانت أدوات معالجة اللغة الطبيعية والموارد قد تم إنشاؤها بشكل أساسي وتدريبها بشكل أساسي على أنواع اللغات القياسية.في الوقت الحاضر، مع استخدام كميات كبيرة من البيانات التي تم جمعها من وسائل التواصل الاجتماعي، تحتاج إلى معالجة الأصناف والتسجيلات الأخرى، والتي قد تقدم تحديات وصعوبات أخرى.في هذا العمل، نركز على اللغة الإنجليزية ونقدم تحليلا أوليا من خلال مقارنة كوربوس Twitteraae، المشروح للعرق، و Wordnet عن طريق تحديد وشرح اللغة عبر الإنترنت التي تفتقدها WordNet.
Natural Language Processing tools and resources have been so far mainly created and trained for standard varieties of language. Nowadays, with the use of large amounts of data gathered from social media, other varieties and registers need to be processed, which may present other challenges and difficulties. In this work, we focus on English and we present a preliminary analysis by comparing the TwitterAAE corpus, which is annotated for ethnicity, and WordNet by quantifying and explaining the online language that WordNet misses.
References used
https://aclanthology.org/
The introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various
SemEval is the primary venue in the NLP community for the proposal of new challenges and for the systematic empirical evaluation of NLP systems. This paper provides a systematic quantitative analysis of SemEval aiming to evidence the patterns of the
The last years have shown rapid developments in the field of multimodal machine learning, combining e.g., vision, text or speech. In this position paper we explain how the field uses outdated definitions of multimodality that prove unfit for the mach
The framing of political issues can influence policy and public opinion. Even though the public plays a key role in creating and spreading frames, little is known about how ordinary people on social media frame political issues. By creating a new dat
Nowadays, there are a lot of advertisements hiding as normal posts or experience sharing in social media. There is little research of advertorial detection on Mandarin Chinese texts. This paper thus aimed to focus on hidden advertorial detection of o