Research papers, master and doctoral theses about Syntax

How much pretraining data do language models need to learn syntax?

911 - Association for Computation Linguistics 2021 مقالة

Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study of the impa ct of pretraining data size on the knowledge of the models. We explore this impact on the syntactic capabilities of RoBERTa, using models trained on incremental sizes of raw text data. First, we use syntactic structural probes to determine whether models pretrained on more data encode a higher amount of syntactic information. Second, we perform a targeted syntactic evaluation to analyze the impact of pretraining data size on the syntactic generalization performance of the models. Third, we compare the performance of the different models on three downstream applications: part-of-speech tagging, dependency parsing and paraphrase identification. We complement our study with an analysis of the cost-benefit trade-off of training such models. Our experiments show that while models pretrained on more data encode more syntactic knowledge and perform better on downstream applications, they do not always offer a better performance across the different syntactic phenomena and come at a higher financial and environmental cost.

learn syntax pretraining data size تعلم بناء الجملة احتجاج حجم البيانات صناعة حمض الفوسفور

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees

537 - Association for Computation Linguistics 2021 مقالة

Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract syntax trees (ASTs) is conducive to summary generation. However, existing approaches fail to fully capture the rich information in ASTs because of the large size/depth of ASTs. In this paper, we propose a novel model CAST that hierarchically splits and reconstructs ASTs. First, we hierarchically split a large AST into a set of subtrees and utilize a recursive neural network to encode the subtrees. Then, we aggregate the embeddings of subtrees by reconstructing the split ASTs to get the representation of the complete AST. Finally, AST representation, together with source code embedding obtained by a vanilla code token encoder, is used for code summarization. Extensive experiments, including the ablation study and the human evaluation, on benchmarks have demonstrated the power of CAST. To facilitate reproducibility, our code and data are available at https://github.com/DeepSoftwareAnalytics/CAST.

hierarchical splitting splitting and reconstruction abstract syntax trees تقسيم هرمي تقسيم وإعادة الإعمار أشجار بناء الجملة مجردة صناعة حمض الفوسفور المزيد..

Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy

725 - Association for Computation Linguistics 2021 مقالة

Statistical language modeling and translation with transformers have found many successful applications in program understanding and generation tasks, setting high benchmarks for tools in modern software development environments. The finite context w indow of these neural models means, however, that they will be unable to leverage the entire relevant context of large files and packages for any given task. While there are many efforts to extend the context window, we introduce an architecture-independent approach for leveraging the syntactic hierarchies of source code for incorporating entire file-level context into a fixed-length window. Using concrete syntax trees of each source file we extract syntactic hierarchies and integrate them into context window by selectively removing from view more specific, less relevant scopes for a given task. We evaluate this approach on code generation tasks and joint translation of natural language and source code in Python programming language, achieving a new state-of-the-art in code completion and summarization for Python in the CodeXGLUE benchmark. We also introduce new CodeXGLUE benchmarks for user-experience-motivated tasks: code completion with normalized literals, method body completion/code summarization conditioned on file-level context.

extended window access syntax hierarchy extended window إمكانية الوصول إلى النافذة الممتدة بناء جملة الهرمية نافذة ممتدة صناعة حمض الفوسفور المزيد..

A Deep Decomposable Model for Disentangling Syntax and Semantics in Sentence Representation

967 - Association for Computation Linguistics 2021 مقالة

Recently, disentanglement based on a generative adversarial network or a variational autoencoder has significantly advanced the performance of diverse applications in CV and NLP domains. Nevertheless, those models still work on coarse levels in the d isentanglement of closely related properties, such as syntax and semantics in human languages. This paper introduces a deep decomposable model based on VAE to disentangle syntax and semantics by using total correlation penalties on KL divergences. Notably, we decompose the KL divergence term of the original VAE so that the generated latent variables can be separated in a more clear-cut and interpretable way. Experiments on benchmark datasets show that our proposed model can significantly improve the disentanglement quality between syntactic and semantic representations for semantic similarity tasks and syntactic similarity tasks.

deep decomposable model disentangling syntax نموذج التحلل العميق بناء بناء الجملة صناعة حمض الفوسفور

Syntax-Based Attention Masking for Neural Machine Translation

734 - Association for Computation Linguistics 2021 مقالة

We present a simple method for extending transformers to source-side trees. We define a number of masks that limit self-attention based on relationships among tree nodes, and we allow each attention head to learn which mask or masks to use. On transl ation from English to various low-resource languages, and translation in both directions between English and German, our method always improves over simple linearization of the source-side parse tree and almost always improves over a sequence-to-sequence baseline, by up to +2.1 BLEU.

النصوص الفلسفية syntax-based attention masking masking for neural اخفاء الاهتمام بناء على بناء الجملة اخفاء للآلام العصبية صناعة حمض الفوسفور

Incorporating Syntax and Semantics in Coreference Resolution with Heterogeneous Graph Attention Network

994 - Association for Computation Linguistics 2021 مقالة

External syntactic and semantic information has been largely ignored by existing neural coreference resolution models. In this paper, we present a heterogeneous graph-based model to incorporate syntactic and semantic structures of sentences. The prop osed graph contains a syntactic sub-graph where tokens are connected based on a dependency tree, and a semantic sub-graph that contains arguments and predicates as nodes and semantic role labels as edges. By applying a graph attention network, we can obtain syntactically and semantically augmented word representation, which can be integrated using an attentive integration layer and gating mechanism. Experiments on the OntoNotes 5.0 benchmark show the effectiveness of our proposed model.

incorporating syntax graph attention network coreference resolution models دمج بناء الجملة شبكة انتباه الرسم البياني نماذج حل النماذج صناعة حمض الفوسفور المزيد..

Case Markers In Ugaritic Language A Comparative Study

1902 - Tishreen University 2017 ورقة بحثية

This research is done to study the case markers in the Ugaritic language and see the syntactic positions of expressions in the sentence, by applying the comparatives method. We show in this study that the noun is used in the case marked expressions , i. e its case changes in accordance with to its place in the sentence and in accordance to the functional element preceding it, so that it could be nominative, accusative or object to a preposition. And our study shows that the present verb could be also cased-marked: it can be nominative, accusative or jussive, and that the case markers can be in this sematic language: case markers, letters, a vowel deletion, or nun- deletion too. Because the Ugaritic language has three symbols for the Hamza with short sounds, they correspond to the case markers in Arabic and the case shows itself in (a) (u) and (i) showing themselves in final position clearly. By comparing the Ugaritic expressions and the Arabic ones we have noticed that we have three case markers, namely (a) (u) and (i). This study alludes to the case markers common in both the language and to those that are different too.

Syntax الإعراب الأوغاريتية مقارنات سامية Ugaritic

Negative Sentence in some of Semitic Languages Comparative linguistic Study

2758 - Tishreen University 2017 ورقة بحثية

This research aims at offering a comparative analysis of the “negative sentence” in the Semitic languages, Akkadian, Hebrew, Ugaritic and Arabic. The research proceeds from the linguistic use in Hamorabi's legislations, The Genesis, The Holy Quran, and The myth of Aqhat. The method used was examining these texts, following up the cases of agreement and disagreement in the syntactical functions of the negative particles. I tried in to clarify the effect of these particles in the time-meaning of the syntactical structure through the negative case of the nominal sentence, verbal sentence, and absolute negative. Then I presented the features of NOT, the oldest negative particle in these languages. This examination allowed the research to survey the cases of the negative sentence in the light of order and apocopation. It refers to the relation between exception and negative, negative and interrogation, negative and confirmation. It draws attention to the phonetic changes in the negative particles. As a conclusion the research states the linguistic study of the negative sentence is of the basic body of the language. It does not suffice to study the concept of the sentence as it is mentioned in the books of Arabic or Semitic syntax. The research proves that the comparative Semitic study endeavors to deep root the linguistic semantic study of the particles. It presents scientific explanation in phonetic, syntactical and morphological cases. The researcher is ushered to a wider space in the field of linguistic analysis of the structures in the Semitic sentence.

النحو السامي تحليل مقارن أدوات النفي Semitic syntax Comparative analysis Negative Particles

From Sentence Syntax to Text Syntax Concept and Practice

4421 - Aِl-Baath University 2017 ورقة بحثية

This research shows the concept of sentence syntax and the text syntax and the difference between them, beside their respective areas .It also tries to specify the obstacles which prevent the progress of this kind of linguistic lesson in our Arabi an collages .Then it stops at the trends of linguistic studies where such kind of linguistic lesson appears .Also tries to monitor the reality of this lingual lesson in the Syrian collages through one sample ,that is Al Baath University .Finally finishes by the most important recommendations which can contribute in developing this kind of lingual lesson .

نحو الجملة نحو النص Sentence Syntax Text Syntax

The Roots of Syntactical Criticism in the Works of Ibn-Forragah the Anthohlogy of Al-Mutanabbi

3380 - Tishreen University 2014 مقالة

Al-Wahidi is one of the greatest syntactic critics who have explained al-Mutanabbi's Anthology. His explanation contains concepts, and syntactic and critical opinions that deserve study and scrutiny. Al-Mutanabbi's poetry stands as a fertile domain f or syntractic criticism as is apparent in the critical arena over his poetry. Through his syntactic judgement, al-Wahidi attempts to support a doctrine, oppose some point of view, elaborate on what violates a proposed principle, or uncover a certain problematic issue somewhere in al-Mutanabbi's poetry; specially when disagreement among critics' opinions appears, and dissimilarity among their doctrines and approaches materialises. Poetry was and is still one significant source to formulate the syntactic structure, even if it witnessed some unstability due to narrators' uncertainties, and imprecision of transference. So, narratives and narrators of poetry have varied, which has consequently created an obvious phenomenon that requires research, and study of the effect that may have on the syntactic rules. This is because syntax is one fundamental aspect of the culture of those interested in the literatry exegeses. This study comes to focus on one essential aspect of syntactic criticism that is already applied to al-Mutanabbi's poetry.

ابن فورَّجة النقد النحو الرواية Al-Wahidi Criticism Syntax Narrative Ibn-Forragah المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد