ﻻ يوجد ملخص باللغة العربية
Code-mixing or code-switching are the effortless phenomena of natural switching between two or more languages in a single conversation. Use of a foreign word in a language; however, does not necessarily mean that the speaker is code-switching because often languages borrow lexical items from other languages. If a word is borrowed, it becomes a part of the lexicon of a language; whereas, during code-switching, the speaker is aware that the conversation involves foreign words or phrases. Identifying whether a foreign word used by a bilingual speaker is due to borrowing or code-switching is a fundamental importance to theories of multilingualism, and an essential prerequisite towards the development of language and speech technologies for multilingual communities. In this paper, we present a series of novel computational methods to identify the borrowed likeliness of a word, based on the social media signals. We first propose context based clustering method to sample a set of candidate words from the social media data.Next, we propose three novel and similar metrics based on the usage of these words by the users in different tweets; these metrics were used to score and rank the candidate words indicating their borrowed likeliness. We compare these rankings with a ground truth ranking constructed through a human judgment experiment. The Spearmans rank correlation between the two rankings (nearly 0.62 for all the three metric variants) is more than double the value (0.26) of the most competitive existing baseline reported in the literature. Some other striking observations are, (i) the correlation is higher for the ground truth data elicited from the younger participants (age less than 30) than that from the older participants, and (ii )those participants who use mixed-language for tweeting the least, provide the best signals of borrowing.
In this paper, we present a set of computational methods to identify the likeliness of a word being borrowed, based on the signals from social media. In terms of Spearman correlation coefficient values, our methods perform more than two times better
With the rise of social media, millions of people are routinely expressing their moods, feelings, and daily struggles with mental health issues on social media platforms like Twitter. Unlike traditional observational cohort studies conducted through
The proliferation of news media available online simultaneously presents a valuable resource and significant challenge to analysts aiming to profile and understand social and cultural trends in a geographic location of interest. While an abundance of
Named entity recognition, and other information extraction tasks, frequently use linguistic features such as part of speech tags or chunkings. For languages where word boundaries are not readily identified in text, word segmentation is a key first st
Rumors and conspiracy theories thrive in environments of low confidence and low trust. Consequently, it is not surprising that ones related to the Covid-19 pandemic are proliferating given the lack of any authoritative scientific consensus on the vir