ﻻ يوجد ملخص باللغة العربية
As large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities. For a more informative evaluation than accuracy on text classification tasks can offer, we propose evaluating systems through a novel measure of prediction coherence. We apply our framework to two existing language understanding benchmarks with different properties to demonstrate its versatility. Our experimental results show that this evaluation framework, although simple in ideas and implementation, is a quick, effective, and versatile measure to provide insight into the coherence of machines predictions.
We present late-time Hubble Space Telescope imaging of the fields of six Swift GRBs lying at 5.0<z<9.5. Our data includes very deep observations of the field of the most distant spectroscopically confirmed burst, GRB 090423, at z=8.2. Using the preci
We present the results of a pilot survey to find dust-reddened quasars by matching the FIRST radio catalog to the UKIDSS near-infrared survey, and using optical data from SDSS to select objects with very red colors. The deep K-band limit provided by
With the ever-increasing complexity of neural language models, practitioners have turned to methods for understanding the predictions of these models. One of the most well-adopted approaches for model interpretability is feature-based interpretabilit
To build an interpretable neural text classifier, most of the prior work has focused on designing inherently interpretable models or finding faithful explanations. A new line of work on improving model interpretability has just started, and many exis
Most adversarial attack methods on text classification can change the classifiers prediction by synonym substitution. We propose the adversarial sentence rewriting sampler (ASRS), which rewrites the whole sentence to generate more similar and higher-