Subscribe to the gold package and get unlimited access to Shamra Academy

Building A Corporate Corpus For Threads Constitution

بناء كوربوس الشركات للحصول على المواضيع

742 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

corporate corpus avail-able corporate corpus threads constitution كوربوس الشركات الاستفادة من Corpus Corpus مواضيع الدستور صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper we describe the process of build-ing a corporate corpus that will be used as a ref-erence for modelling and computing threadsfrom conversations generated using commu-nication and collaboration tools. The overallgoal of the reconstruction of threads is to beable to provide value to the collorator in var-ious use cases, such as higlighting the impor-tant parts of a running discussion, reviewingthe upcoming commitments or deadlines, etc. Since, to our knowledge, there is no avail-able corporate corpus for the French languagewhich could allow us to address this prob-lem of thread constitution, we present here amethod for building such corpora includingdifferent aspects and steps which allowed thecreation of a pipeline to pseudo-anonymisedata. Such a pipeline is a response to theconstraints induced by the General Data Pro-tection Regulation GDPR in Europe and thecompliance to the secrecy of correspondence.

References used

https://aclanthology.org/

rate research

Building a Corpus for Corporate Websites Machine Translation Evaluation. A Step by Step Methodological Approach

926 - Association for Computation Linguistics 2021 مقالة

The aim of this paper is to describe the process carried out to develop a paral-lel corpus comprised of texts extracted from the corporate websites of south-ern Spanish SMEs from the sanitary sector which will serve as the basis for MT quality assess ment. The stages for compiling the parallel corpora were: (i) selection of websites with content translated in English and Spanish, (ii) downloading of the HTML files of the selected websites, (iii) files filtering and pairing of English files with their Spanish equivalents, (iv) compilation of individual corpora (EN and ES) for each of the selected websites, (v) merging of the individual corpora into a two general corpus one in English and the other in Spanish, (vi) selection a representative sample of segments to be used as original (ES) and reference translations (EN), (vii) building of the parallel corpus intended for MT evaluation. The parallel corpus generated will serve to future Machine Translation quality assessment. In addition, the monolingual corpora generated during the process could as a base to carry out research focused on linguistic -- bilingual or monolingual − analysis.

step methodological approach corporate websites machine machine translation evaluation خطوة النهج المنهجي مواقع الويب الخاصة بالشركات صناعة حمض الفوسفور

A Corpus for Dimensional Sentiment Classification on YouTube Streaming Service

1305 - Association for Computation Linguistics 2021 مقالة

The streaming service platform such as YouTube provides a discussion function for audiences worldwide to share comments. YouTubers who upload videos to the YouTube platform want to track the performance of these uploaded videos. However, the present analysis functions of YouTube only provide a few performance indicators such as average view duration, browsing history, variance in audience's demographics, etc., and lack of sentiment analysis on the audience's comments. Therefore, the paper proposes multi-dimensional sentiment indicators such as YouTuber preference, Video preferences, and Excitement level to capture comprehensive sentiment on audience comments for videos and YouTubers. To evaluate the performance of different classifiers, we experiment with deep learning-based, machine learning-based, and BERT-based classifiers to automatically detect three sentiment indicators of an audience's comments. Experimental results indicate that the BERT-based classifier is a better classification model than other classifiers according to F1-score, and the sentiment indicator of Excitement level is quite an improvement. Therefore, the multiple sentiment detection tasks on the video streaming service platform can be solved by the proposed multi-dimensional sentiment indicators accompanied with BERT classifier to gain the best result.

corpus for dimensional streaming service platform dimensional sentiment classification كوربوس للأبعاد منصة خدمة البث تصنيف المعنويات الأبعاد صناعة حمض الفوسفور المزيد..

ParsTwiNER: A Corpus for Named Entity Recognition at Informal Persian

683 - Association for Computation Linguistics 2021 مقالة

As a result of unstructured sentences and some misspellings and errors, finding named entities in a noisy environment such as social media takes much more effort. ParsTwiNER contains about 250k tokens, based on standard instructions like MUC-6 or CoN LL 2003, gathered from Persian Twitter. Using Cohen's Kappa coefficient, the consistency of annotators is 0.95, a high score. In this study, we demonstrate that some state-of-the-art models degrade on these corpora, and trained a new model using parallel transfer learning based on the BERT architecture. Experimental results show that the model works well in informal Persian as well as in formal Persian.

مقياس النطاق entity recognition named entity اسمه الكيان الاعتراف اعتراف الكيان كيان اسمه صناعة حمض الفوسفور المزيد..

Sarcasm Detection and Building an English Language Corpus in Real Time

787 - Association for Computation Linguistics 2021 مقالة

This is a research proposal for doctoral research into sarcasm detection, and the real-time compilation of an English language corpus of sarcastic utterances. It details the previous research into similar topics, the potential research directions and the research aims.

english language corpus building an english اللغة الإنجليزية Corpus. في الوقت الحالى بناء اللغة الإنجليزية صناعة حمض الفوسفور

DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues

953 - Association for Computation Linguistics 2021 مقالة

Recently, the Machine Translation (MT) community has become more interested in document-level evaluation especially in light of reactions to claims of human parity'', since examining the quality at the level of the document rather than at the sentenc e level allows for the assessment of suprasentential context, providing a more reliable evaluation. This paper presents a document-level corpus annotated in English with context-aware issues that arise when translating from English into Brazilian Portuguese, namely ellipsis, gender, lexical ambiguity, number, reference, and terminology, with six different domains. The corpus can be used as a challenge test set for evaluation and as a training/testing corpus for MT as well as for deep linguistic analysis of context issues. To the best of our knowledge, this is the first corpus of its kind.

document-level corpus annotated dela corpus corpus annotated وصف مستوى المستند المشروح ديلا كوربوس corpus المشروح صناعة حمض الفوسفور المزيد..

Building A Corporate Corpus For Threads Constitution

بناء كوربوس الشركات للحصول على المواضيع

Ask ChatGPT about the research

Read More

suggested questions