Do you want to publish a course? Click here

hBERT + BiasCorp - Fighting Racism on the Web

هبيرت + BIASCORP - مكافحة العنصرية على شبكة الإنترنت

185   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Subtle and overt racism is still present both in physical and online communities today and has impacted many lives in different segments of the society. In this short piece of work, we present how we're tackling this societal issue with Natural Language Processing. We are releasing BiasCorp, a dataset containing 139,090 comments and news segment from three specific sources - Fox News, BreitbartNews and YouTube. The first batch (45,000 manually annotated) is ready for publication. We are currently in the final phase of manually labeling the remaining dataset using Amazon Mechanical Turk. BERT has been used widely in several downstream tasks. In this work, we present hBERT, where we modify certain layers of the pretrained BERT model with the new Hopfield Layer. hBert generalizes well across different distributions with the added advantage of a reduced model complexity. We are also releasing a JavaScript library 3 and a Chrome Extension Application, to help developers make use of our trained model in web applications (say chat application) and for users to identify and report racially biased contents on the web respectively



References used
https://aclanthology.org/
rate research

Read More

AI assistants can now carry out tasks for users by directly interacting with website UIs. Current semantic parsing and slot-filling techniques cannot flexibly adapt to many different websites without being constantly re-trained. We propose FLIN, a na tural language interface for web navigation that maps user commands to concept-level actions (rather than low-level UI actions), thus being able to flexibly adapt to different websites and handle their transient nature. We frame this as a ranking problem: given a user command and a webpage, FLIN learns to score the most relevant navigation instruction (involving action and parameter values). To train and evaluate FLIN, we collect a dataset using nine popular websites from three domains. Our results show that FLIN was able to adapt to new websites in a given domain.
Given the more widespread nature of natural language interfaces, it is increasingly important to understand who are accessing those interfaces, and how those interfaces are being used. In this paper, we explore spellchecking in the context of web sea rch with children as the target audience. In particular, via a literature review we show that, while widely used, popular search tools are ill-designed for children. We then use spellcheckers as a case study to highlight the need for an interdisciplinary approach that brings together natural language processing, education, human-computer interaction to address a known information retrieval problem: query misspelling. We conclude that it is imperative that those for whom the interfaces are designed have a voice in the design process.
The internet unique publications are considered by many to be the fourth type of journalism. This is due to the advantages of the internet through which many problems of publishing were solved. Since the city of Jerusalem has its special conditions (being under the Israeli occupation), the present study is an attempt to investigate the informational presence of this city on the internet. The purpose of this study is to know the size of the informational presence of this city on the internet in comparison with other Arab capital cities, and to study the digital documentary content of the websites which dealt with this city. This study answers a number of questions related to the content of the documents and websites which dealt with this city. The following documents and websites were analyzed during the months of April, May and June in 2009.
In the pandemic period, the stay-at-home trend forced businesses to switch their activities to digital mode, for example, app-based payment methods, social distancing via social media platforms, and other digital means have become an integral part of our lives. Sentiment analysis of textual information in user comments is a topical task in emotion AI because user comments or reviews are not homogeneous, they contain sparse context behind, and are misleading both for human and computer. Barriers arise from the emotional language enriched with slang, peculiar spelling, transliteration, use of emoji and their symbolic counterparts, and code-switching. For low resource languages sentiment analysis has not been worked upon extensively, because of an absence of ready-made tools and linguistic resources for sentiment analysis. This research focuses on developing a method for aspect-based sentiment analysis for Kazakh-language reviews in Android Google Play Market.
The objective of this work was the introduction of an effective approach based on the AraBERT language model for fighting Tweets COVID-19 Infodemic. It was arranged in the form of a two-step pipeline, where the first step involved a series of pre-pro cessing procedures to transform Twitter jargon, including emojis and emoticons, into plain text, and the second step exploited a version of AraBERT, which was pre-trained on plain text, to fine-tune and classify the tweets with respect to their Label. The use of language models pre-trained on plain texts rather than on tweets was motivated by the necessity to address two critical issues shown by the scientific literature, namely (1) pre-trained language models are widely available in many languages, avoiding the time-consuming and resource-intensive model training directly on tweets from scratch, allowing to focus only on their fine-tuning; (2) available plain text corpora are larger than tweet-only ones, allowing for better performance.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا