New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Expected Validation Performance and Estimation of a Random Variable's Maximum

أداء التحقق من الصحة المتوقع وتقدير الحد الأقصى للمتغير العشوائي

581 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Research in NLP is often supported by experimental results, and improved reporting of such results can lead to better understanding and more reproducible science. In this paper we analyze three statistical estimators for expected validation performance, a tool used for reporting performance (e.g., accuracy) as a function of computational budget (e.g., number of hyperparameter tuning experiments). Where previous work analyzing such estimators focused on the bias, we also examine the variance and mean squared error (MSE). In both synthetic and realistic scenarios, we evaluate three estimators and find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias; the estimator with the smallest MSE strikes a balance between bias and variance, displaying a classic bias-variance tradeoff. We use expected validation performance to compare between different models, and analyze how frequently each estimator leads to drawing incorrect conclusions about which of two models performs best. We find that the two biased estimators lead to the fewest incorrect conclusions, which hints at the importance of minimizing variance and MSE.

References used

https://aclanthology.org/

rate research

Shared Task in Evaluating Accuracy: Leveraging Pre-Annotations in the Validation Process

389 - Association for Computation Linguistics 2021 مقالة

We hereby present our submission to the Shared Task in Evaluating Accuracy at the INLG 2021 Conference. Our evaluation protocol relies on three main components; rules and text classifiers that pre-annotate the dataset, a human annotator that validate s the pre-annotations, and a web interface that facilitates this validation. Our submission consists in fact of two submissions; we first analyze solely the performance of the rules and classifiers (pre-annotations), and then the human evaluation aided by the former pre-annotations using the web interface (hybrid). The code for the web interface and the classifiers is publicly available.

task in evaluating evaluating accuracy المهمة في تقييم تقييم الدقة صناعة حمض الفوسفور

Studying the performance and the Efficiency of random linear network coding in achieving the reliable transfer of data in losing wireless networks

1703 - Tishreen University 2017 ورقة بحثية

Wireless networks suffer from frequent loss of packets for many reasons such as interference, collision and fading. This makes wireless medium unreliable medium for data transfer. The main methods for ensuring the reliability in this medium are usi ng transmission control protocol (TCP) and the automatic repeat request (ARQ). Recently, network coding has been found as new technology that changes the traditional forwarding method (Store- and- Forward) in the networks to more effective and intelligent method (Code- and- Forward), which contributes to the increase of both capacity and throughput of these networks. In this research, random linear network coding is used as promising technology that aims to achieve the reliable transfer of data in losing wireless networks, and studying the enhancement that this technology presents to the performance of these networks in unicast and multicast transmission. For evaluating the efficiency of this technology and comparing its performance with the performance of reliable transfer protocols, we use the networks simulator (NS3). Simulation results showed that random linear network coding achieve the reliable transfer of data with bigger throughput and less delay and number of transmission compared with the protocols (TCP, ARQ).

reliability الوثوقية ترميز الشبكة الخطي العشوائي Random linear network coding TCP بروتوكول التحكم بالإرسال الشبكات اللاسلكية بروتكول إعادة الطلب التلقائي Wireless networks ARQ المزيد..

Using Multi-Sets of Features to improve the Performance of Automatic Signature Verification Systems

1279 - Damascus University 2010 ورقة بحثية

For decades, published Automatic Signature Verification (ASV) works depended on using one feature set. Some researchers selected this feature set based on their experience, and some others selected it using some feature selection algorithms that can select the best feature set (bfs). In practical systems, the documents containing the signatures could be noisy, and recognition of check writer in multi-signatory accounts is required. Due to the error caused by such requirements and data quality, improving the performance of ASV becomes a necessity. In this paper, a new technique for ASV decision making using Multi-Sets of Features is introduced. The experimental results have shown that the introduced technique gives important improvement in forgery detection and in the overall performance of the system.

مجموعات الخصائص المتعددة رفع أداء أنظمة التحقق من صحة التواقيع أنظمة التحقق من صحة التواقيع Multi-Sets of Features improve the Performance of Automatic Signature Verification Systems Automatic Signature Verification Systems

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

411 - Association for Computation Linguistics 2021 مقالة

Pre-trained neural language models give high performance on natural language inference (NLI) tasks. But whether they actually understand the meaning of the processed sequences is still unclear. We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI), which involve removing entire word classes and often lead to non-sensical sentence pairs. If model accuracy on the corrupted data remains high, then the dataset is likely to contain statistical biases and artefacts that guide prediction. Inversely, a large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities. Hence, our proposed controls can serve as a crash test for developing high quality data for NLI tasks.

data sanity check sanity check assessing the effect التحقق من البيانات الاختيار التعقل تقييم تأثير صناعة حمض الفوسفور المزيد..

Simplifying annotation of intersections in time normalization annotation: exploring syntactic and semantic validation

662 - Association for Computation Linguistics 2021 مقالة

While annotating normalized times in food security documents, we found that the semantically compositional annotation for time normalization (SCATE) scheme required several near-duplicate annotations to get the correct semantics for expressions like Nov. 7th to 11th 2021. To reduce this problem, we explored replacing SCATE's Sub-Interval property with a Super-Interval property, that is, making the smallest units (e.g., 7th and 11th) rather than the largest units (e.g., 2021) the heads of the intersection chains. To ensure that the semantics of annotated time intervals remained unaltered despite our changes to the syntax of the annotation scheme, we applied several different techniques to validate our changes. These validation techniques detected and allowed us to resolve several important bugs in our automated translation from Sub-Interval to Super-Interval syntax.

exploring syntactic time normalization annotation time normalization استكشاف النحوية تطبيع الوقت التطبلق تطبيع الوقت صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Expected Validation Performance and Estimation of a Random Variable's Maximum

أداء التحقق من الصحة المتوقع وتقدير الحد الأقصى للمتغير العشوائي

Ask ChatGPT about the research

Read More

suggested questions