Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification

348 0 0.0 ( 0 )

Download Cite

Added by David Pierce

Publication date 1998

fields Informatics Engineering

and research's language is English

Authors Claire Cardie - David Pierce

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. In particular, we present a corpus-based approach for finding base NPs by matching part-of-speech tag sequences. The training phase of the algorithm is based on two successful techniques: first the base NP grammar is read from a ``treebank corpus; then the grammar is improved by selecting rules with high ``benefit scores. Using this simple algorithm with a naive heuristic for matching rules, we achieve surprising accuracy in an evaluation on the Penn Treebank Wall Street Journal.

rate research

On Implementing an HPSG theory -- Aspects of the logical architecture, the formalization, and the implementation of head-driven phrase structure grammars

57 - Walt Detmar Meurers 1994

The paper presents some aspects involved in the formalization and implementation of HPSG theories. As basis, the logical setups of Carpenter (1992) and King (1989, 1994) are briefly compared regarding their usefulness as basis for HPSGII (Pollard and Sag 1994). The possibilities for expressing HPSG theories in the HPSGII architecture and in various computational systems (ALE, Troll, CUF, and TFS) are discussed. Beside a formal characterization of the possibilities, the paper investigates the specific choices for constraints with certain linguistic motivations, i.e. the lexicon, structure licencing, and grammatical principles. An ALE implementation of a theory for German proposed by Hinrichs and Nakazawa (1994) is used as example and the ALE grammar is included in the appendix.

Computation and Language

Contrastive Learning for Weakly Supervised Phrase Grounding

155 - Tanmay Gupta , Arash Vahdat , Gal Chechik 2020

Phrase grounding, the problem of associating image regions to caption words, is a crucial component of vision-language tasks. We show that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words. Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions. A key idea is to construct effective negative captions for learning through language model guided word substitutions. Training with our negatives yields a $sim10%$ absolute gain in accuracy over randomly-sampled negatives from the training data. Our weakly supervised phrase grounding model trained on COCO-Captions shows a healthy gain of $5.7%$ to achieve $76.7%$ accuracy on Flickr30K Entities benchmark.

Computer Vision and Pattern Recognition Computation and Language Machine Learning

Pattern-Based Context-Free Grammars for Machine Translation

331 - Koichi Takeda 1996

This paper proposes the use of ``pattern-based context-free grammars as a basis for building machine translation (MT) systems, which are now being adopted as personal tools by a broad range of users in the cyberspace society. We discuss major requirements for such tools, including easy customization for diverse domains, the efficiency of the translation algorithm, and scalability (incremental improvement in translation quality through user interaction), and describe how our approach meets these requirements.

Computation and Language

Learning synchronous context-free grammars with multiple specialised non-terminals for hierarchical phrase-based translation

83 - Felipe Sanchez-Martinez , Juan Antonio Perez-Ortiz , Rafael C.n Carrasco 2020

Translation models based on hierarchical phrase-based statistical machine translation (HSMT) have shown better performances than the non-hierarchical phrase-based counterparts for some language pairs. The standard approach to HSMT learns and apply a synchronous context-free grammar with a single non-terminal. The hypothesis behind the grammar refinement algorithm presented in this work is that this single non-terminal is overloaded, and insufficiently discriminative, and therefore, an adequate split of it into more specialised symbols could lead to improved models. This paper presents a method to learn synchronous context-free grammars with a huge number of initial non-terminals, which are then grouped via a clustering algorithm. Our experiments show that the resulting smaller set of non-terminals correctly capture the contextual information that makes it possible to statistically significantly improve the BLEU score of the standard HSMT approach.

Computation and Language Machine Learning

Transfer and Multi-Task Learning for Noun-Noun Compound Interpretation

121 - Murhaf Fares , Stephan Oepen , Erik Velldal 2018

In this paper, we empirically evaluate the utility of transfer and multi-task learning on a challenging semantic classification task: semantic interpretation of noun--noun compounds. Through a comprehensive series of experiments and in-depth error analysis, we show that transfer learning via parameter initialization and multi-task learning via parameter sharing can help a neural classification model generalize over a highly skewed distribution of relations. Further, we demonstrate how dual annotation with two distinct sets of relations over the same set of compounds can be exploited to improve the overall accuracy of a neural classifier and its F1 scores on the less frequent, but more difficult relations.

Computation and Language

comments

Fetching comments

Wadi International University

Additional details More universities

Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification

Ask ChatGPT about the research

No Arabic abstract

Read More