اليوم، تتضمن المؤسسات الإعلامية الأخبار بانتظام مع القراء من خلال تمكينهم من التعليق على المقالات الإخبارية.هذا يخلق الحاجة إلى التعليق الاعتدال وإزالة التعليقات غير المسموح بها - وهي مهمة تستغرق وقتا طويلا في كثير من الأحيان أداءها المشرفين البشري.في هذه الورقة، نقترب من مشكلة اعتدال تعليق الأخبار التلقائي كتصنيف للتعليقات في الفئات المحظورة وغير المحظورة.نبني مجموعة بيانات جديدة من تعليقات باللغة الإنجليزية المشروح، وتجربة نقل متصل بالتعليق، وتقييم العديد من نماذج التعلم الآلي على مجموعات بيانات الأخبار الكرواتية والإستونية.اسم الفريق: SuperAdmin؛التحدي: اكتشاف التعليقات المحظورة؛أدوات / نماذج: Brrosloen Bert، أرقى بيرت، 24sata تعليق DataSet، Ekspress تعليق DataSet.
Today, news media organizations regularly engage with readers by enabling them to comment on news articles. This creates the need for comment moderation and removal of disallowed comments -- a time-consuming task often performed by human moderators. In this paper we approach the problem of automatic news comment moderation as classification of comments into blocked and not blocked categories. We construct a novel dataset of annotated English comments, experiment with cross-lingual transfer of comment labels and evaluate several machine learning models on datasets of Croatian and Estonian news comments. Team name: SuperAdmin; Challenge: Detection of blocked comments; Tools/models: CroSloEn BERT, FinEst BERT, 24Sata comment dataset, Ekspress comment dataset.
References used
https://aclanthology.org/
Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system's quality over another. The community choice of automatic metric guides research directions and industrial developments by decid
In recent years, time-critical processing or real-time processing and analytics of bid data have received a significant amount of attentions. There are many areas/domains where real-time processing of data and making timely decision can save thousand
Moderation of reader comments is a significant problem for online news platforms. Here, we experiment with models for automatic moderation, using a dataset of comments from a popular Croatian newspaper. Our analysis shows that while comments that vio
Recent work has demonstrated that pre-training in-domain language models can boost performance when adapting to a new domain. However, the costs associated with pre-training raise an important question: given a fixed budget, what steps should an NLP
Recently, pre-trained language representation models such as BERT and RoBERTa have achieved significant results in a wide range of natural language processing (NLP) tasks, however, it requires extremely high computational cost. Curriculum Learning (C