ﻻ يوجد ملخص باللغة العربية
We describe and evaluate different approaches to the conversion of gold standard corpus data from Stanford Typed Dependencies (SD) and Penn-style constituent trees to the latest English Universal Dependencies representation (UD 2.2). Our results indicate that pure SD to UD conversion is highly accurate across multiple genres, resulting in around 1.5% errors, but can be improved further to fewer than 0.5% errors given access to annotations beyond the pure syntax tree, such as entity types and coreference resolution, which are necessary for correct generation of several UD relations. We show that constituent-based conversion using CoreNLP (with automatic NER) performs substantially worse in all genres, including when using gold constituent trees, primarily due to underspecification of phrasal grammatical functions.
Back-translation provides a simple yet effective approach to exploit monolingual corpora in Neural Machine Translation (NMT). Its iterative variant, where two opposite NMT models are jointly trained by alternately using a synthetic parallel corpus ge
The growing literature on benign overfitting in overparameterized models has been mostly restricted to regression or binary classification settings; however, most success stories of modern machine learning have been recorded in multiclass settings. M
While the highly multilingual Universal Dependencies (UD) project provides extensive guidelines for clausal structure as well as structure within canonical nominal phrases, a standard treatment is lacking for many mischievous nominal phenomena that b
Mixing and decoherence are both manifestations of classicality within quantum theory, each of which admit a very general category-theoretic construction. We show under which conditions these two roads to classicality coincide. This is indeed the case
Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentatio