Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols

يمكن أن تفعل PCFGS بشكل أفضل: حث النحو من السياق الحاصل مع العديد من الرموز

606 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

inducing probabilistic context-free probabilistic context-free grammars inducing probabilistic حث خالية من السياق الاحتمالية قواعد النحوية الخالية من السياق حث الاحتمالية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Probabilistic context-free grammars (PCFGs) with neural parameterization have been shown to be effective in unsupervised phrase-structure grammar induction. However, due to the cubic computational complexity of PCFG representation and parsing, previous approaches cannot scale up to a relatively large number of (nonterminal and preterminal) symbols. In this work, we present a new parameterization form of PCFGs based on tensor decomposition, which has at most quadratic computational complexity in the symbol number and therefore allows us to use a much larger number of symbols. We further use neural parameterization for the new form to improve unsupervised parsing performance. We evaluate our model across ten languages and empirically demonstrate the effectiveness of using more symbols.

References used

https://aclanthology.org/

rate research

Can Question Generation Debias Question Answering Models? A Case Study on Question--Context Lexical Overlap

646 - Association for Computation Linguistics 2021 مقالة

Question answering (QA) models for reading comprehension have been demonstrated to exploit unintended dataset biases such as question--context lexical overlap. This hinders QA models from generalizing to under-represented samples such as questions wi th low lexical overlap. Question generation (QG), a method for augmenting QA datasets, can be a solution for such performance degradation if QG can properly debias QA datasets. However, we discover that recent neural QG models are biased towards generating questions with high lexical overlap, which can amplify the dataset bias. Moreover, our analysis reveals that data augmentation with these QG models frequently impairs the performance on questions with low lexical overlap, while improving that on questions with high lexical overlap. To address this problem, we use a synonym replacement-based approach to augment questions with low lexical overlap. We demonstrate that the proposed data augmentation approach is simple yet effective to mitigate the degradation problem with only 70k synthetic examples.

context lexical overlap lexical overlap low lexical overlap التداخل المعجمي السياق التداخل المعجمي التداخل المعجمي المنخفض صناعة حمض الفوسفور المزيد..

Supertagging-based Parsing with Linear Context-free Rewriting Systems

662 - Association for Computation Linguistics 2021 مقالة

We present the first supertagging-based parser for linear context-free rewriting systems (LCFRS). It utilizes neural classifiers and outperforms previous LCFRS-based parsers in both accuracy and parsing speed by a wide margin. Our results keep up wit h the best (general) discontinuous parsers, particularly the scores for discontinuous constituents establish a new state of the art. The heart of our approach is an efficient lexicalization procedure which induces a lexical LCFRS from any discontinuous treebank. We describe a modification to usual chart-based LCFRS parsing that accounts for supertagging and introduce a procedure that transforms lexical LCFRS derivations into equivalent parse trees of the original treebank. Our approach is evaluated on the English Discontinuous Penn Treebank and the German treebanks Negra and Tiger.

context-free rewriting systems linear context-free rewriting rewriting systems أنظمة إعادة كتابة الخالية من السياق إعادة كتابة خالية من السياق الخطي إعادة كتابة الأنظمة صناعة حمض الفوسفور المزيد..

Learning with Different Amounts of Annotation: From Zero to Many Labels

655 - Association for Computation Linguistics 2021 مقالة

Training NLP systems typically assumes access to annotated data that has a single human label per example. Given imperfect labeling from annotators and inherent ambiguity of language, we hypothesize that single label is not sufficient to learn the sp ectrum of language interpretation. We explore new annotation distribution schemes, assigning multiple labels per example for a small subset of training examples. Introducing such multi label examples at the cost of annotating fewer examples brings clear gains on natural language inference task and entity typing task, even when we simply first train with a single label data and then fine tune with multi label examples. Extending a MixUp data augmentation framework, we propose a learning algorithm that can learn from training examples with different amount of annotation (with zero, one, or multiple labels). This algorithm efficiently combines signals from uneven training data and brings additional gains in low annotation budget and cross domain settings. Together, our method achieves consistent gains in two tasks, suggesting distributing labels unevenly among training examples can be beneficial for many NLP tasks.

تقييم الاستدلال القوي single label labels ضع الكلمة المناسبة تسمية واحدة تسميات صناعة حمض الفوسفور المزيد..

Inducing Stereotypical Character Roles from Plot Structure

714 - Association for Computation Linguistics 2021 مقالة

Stereotypical character roles-also known as archetypes or dramatis personae-play an important function in narratives: they facilitate efficient communication with bundles of default characteristics and associations and ease understanding of those cha racters' roles in the overall narrative. We present a fully unsupervised k-means clustering approach for learning stereotypical roles given only structural plot information. We demonstrate the technique on Vladimir Propp's structural theory of Russian folktales (captured in the extended ProppLearner corpus, with 46 tales), showing that our approach can induce six out of seven of Propp's dramatis personae with F1 measures of up to 0.70 (0.58 average), with an additional category for minor characters. We have explored various feature sets and variations of a cluster evaluation method. The best-performing feature set comprises plot functions, unigrams, tf-idf weights, and embeddings over coreference chain heads. Roles that are mentioned more often (Hero, Villain), or have clearly distinct plot patterns (Princess) are more strongly differentiated than less frequent or distinct roles (Dispatcher, Helper, Donor). Detailed error analysis suggests that the quality of the coreference chain and plot functions annotations are critical for this task. We provide all our data and code for reproducibility.

inducing stereotypical character stereotypical character roles plot structure حث الشخصية النمطية أدوار الشخصية النمطية هيكل المؤامرة صناعة حمض الفوسفور المزيد..

Can Latent Alignments Improve Autoregressive Machine Translation?

771 - Association for Computation Linguistics 2021 مقالة

Latent alignment objectives such as CTC and AXE significantly improve non-autoregressive machine translation models. Can they improve autoregressive models as well? We explore the possibility of training autoregressive machine translation models with latent alignment objectives, and observe that, in practice, this approach results in degenerate models. We provide a theoretical explanation for these empirical results, and prove that latent alignment objectives are incompatible with teacher forcing.

autoregressive machine translation machine translation models ترجمة الآلة التلقائي نماذج الترجمة الآلية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols

يمكن أن تفعل PCFGS بشكل أفضل: حث النحو من السياق الحاصل مع العديد من الرموز

Ask ChatGPT about the research

Read More

suggested questions