New community

Subscribe to the gold package and get unlimited access to Shamra Academy

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

التحقق من البيانات NLI: تقييم تأثير تلف البيانات على أداء النموذج

410 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

data sanity check sanity check assessing the effect التحقق من البيانات الاختيار التعقل تقييم تأثير صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Pre-trained neural language models give high performance on natural language inference (NLI) tasks. But whether they actually understand the meaning of the processed sequences is still unclear. We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI), which involve removing entire word classes and often lead to non-sensical sentence pairs. If model accuracy on the corrupted data remains high, then the dataset is likely to contain statistical biases and artefacts that guide prediction. Inversely, a large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities. Hence, our proposed controls can serve as a crash test for developing high quality data for NLI tasks.

References used

https://aclanthology.org/

rate research

Study of Data Size Changing Effect on Performance of Cloud, Hybrid, And Traditional DBMSs

2375 - Tishreen University 2015 ورقة بحثية

With the rapid growth of the size of the data stored in the cloud systems, the need for effective data processing becomes critical and urgent. This research introduces a study of the most important characteristics of databases management systems: H ive, SQLMR, and MariaDB Galera. Hive is a cloud database management system. SQLMR is a hybrid system, which depends on the integration between the cloud and traditional systems capabilities. While MariaDB Galera is a traditional database management system developed to cope with the cloud characteristics. In this research, we show the most important developments that have been on those systems, and then we compare their performance in data processing based on the execution time of query operations with the change of the volume of data. That is to identify the performance of those systems practically and to know the developing requirements for access to optimized data management system, and to help users in the selection of the database system that achieves their requirements in terms of availability and scalability.

الحوسبة السحابية نظم إدارة قواعد البيانات السحابية Apache Hadoop Hive SQLMR Shard-Query cloud computing cloud DBMS Galera cluster MariaDB المزيد..

Automatic Verification of Data Summaries

433 - Association for Computation Linguistics 2021 مقالة

We present a generic method to compute thefactual accuracy of a generated data summarywith minimal user effort. We look at the prob-lem as a fact-checking task to verify the nu-merical claims in the text. The verification al-gorithm assumes that the data used to generatethe text is available. In this paper, we describehow the proposed solution has been used toidentify incorrect claims about basketball tex-tual summaries in the context of the AccuracyShared Task at INLG 2021.

automatic verification generated data summarywith data summarywith minimal التحقق التلقائي توليد البيانات الناتجة البيانات Summarywith الحد الأدنى صناعة حمض الفوسفور المزيد..

Data science and knowledge extraction from raw data

1914 - شمرا 2019 محاضرة

تعرض المحاضرة شرح عن علم البيانات وعلاقته بعلم الإحصاء والتعلم الآلي وحالتين دراسيتين عن دور عالم البيانات في تصميم حلول تعتمد على استخراج المعرفة من حجم كبير من البيانات المتوفرة, كما يتم عرض أهم المهام في المؤتمرات العلمية التي يمكن المشاركة بها لطلاب المعلوماتية المهتمين بهذا المجال

Machine learning Artificial intelligence Statistics Data science

Data model Vs Simulation model in big data

1166 - Damascus University 2019 مقالة

حظي مؤخرا اختصاص البيانات الضخمة باهتمام كبير في مجالات متنوعة منها (الطب , العلوم , الادارة, السياسة , ......) و يهتم هذا الاختصاص بدراسة مجموعة البيانات الضخمة والتي تعجز الادوات والطرق الشائعة على معالجتها و ادارتها و تنظيمها خلال فترة زمنية مقبو لة و بناء نموذج للتعامل مع هذه المعطيات والتنبؤ باغراض مطلوبة منها. ولاجراء هذه الدراسات ظهرت طرق عدة منها النماذج التي تعتمد على مجموعة من البيانات و نماذج تعتمد على المحاكاة و في هذه المقالة تم توضيح الفرق بين النموذجين و تطبيق نهج جديد يعتمد على التكامل بين النموذجين لاعطاء نموذح افضل لمعالجة مسالة البيوت البلاستيكة

An Overview of Fairness in Data -- Illuminating the Bias in Data Pipeline

278 - Association for Computation Linguistics 2021 مقالة

Data in general encodes human biases by default; being aware of this is a good start, and the research around how to handle it is ongoing. The term bias' is extensively used in various contexts in NLP systems. In our research the focus is specific to biases such as gender, racism, religion, demographic and other intersectional views on biases that prevail in text processing systems responsible for systematically discriminating specific population, which is not ethical in NLP. These biases exacerbate the lack of equality, diversity and inclusion of specific population while utilizing the NLP applications. The tools and technology at the intermediate level utilize biased data, and transfer or amplify this bias to the downstream applications. However, it is not enough to be colourblind, gender-neutral alone when designing a unbiased technology -- instead, we should take a conscious effort by designing a unified framework to measure and benchmark the bias. In this paper, we recommend six measures and one augment measure based on the observations of the bias in data, annotations, text representations and debiasing techniques.

overview of fairness data pipeline illuminating the bias نظرة عامة على النزاهة خط أنابيب البيانات إيلاء التحيز صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

التحقق من البيانات NLI: تقييم تأثير تلف البيانات على أداء النموذج

Ask ChatGPT about the research

Read More

suggested questions