An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities

44 0 0.0 ( 0 )

Download Cite

Added by Andreas Stolcke

Publication date 1994

fields Informatics Engineering

and research's language is English

Authors Andreas Stolcke

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We describe an extension of Earleys parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (Viterbi) parse of the string; d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earleys top-down control structure. It can process any context-free rule format without conversion to some normal form, and combines computations for (a) through (d) in a single algorithm. Finally, the algorithm has simple extensions for processing partially bracketed inputs, and for finding partial parses and their likelihoods on ungrammatical inputs.

rate research

Certified Context-Free Parsing: A formalisation of Valiants Algorithm in Agda

61 - Jean-Philippe Bernardy , Patrik Jansson 2016

Valiant (1975) has developed an algorithm for recognition of context free languages. As of today, it remains the algorithm with the best asymptotic complexity for this purpose. In this paper, we present an algebraic specification, implementation, and proof of correctness of a generalisation of Valiants algorithm. The generalisation can be used for recognition, parsing or generic calculation of the transitive closure of upper triangular matrices. The proof is certified by the Agda proof assistant. The certification is representative of state-of-the-art methods for specification and proofs in proof assistants based on type-theory. As such, this paper can be read as a tutorial for the Agda system.

Logic in Computer Science

An efficient Cellular Potts Model algorithm that forbids cell fragmentation

270 - Marc Durand 2016

The Cellular Potts Model (CPM) is a lattice based modeling technique which is widely used for simulating cellular patterns such as foams or biological tissues. Despite its realism and generality, the standard Monte Carlo algorithm used in the scientific literature to evolve this model preserves connectivity of cells on a limited range of simulation temperature only. We present a new algorithm in which cell fragmentation is forbidden for all simulation temperatures. This allows to significantly enhance realism of the simulated patterns. It also increases the computational efficiency compared with the standard CPM algorithm even at same simulation temperature, thanks to the time spared in not doing unrealistic moves. Moreover, our algorithm restores the detailed balance equation, ensuring that the long-term stage is independent of the chosen acceptance rate and chosen path in the temperature space.

Soft Condensed Matter Cell Behavior

Computing the longest common prefix of a context-free language in polynomial time

110 - Michael Luttenberger , Raphaela Palenta , Helmut Seidl 2017

We present two structural results concerning longest common prefixes of non-empty languages. First, we show that the longest common prefix of the language generated by a context-free grammar of size $N$ equals the longest common prefix of the same grammar where the heights of the derivation trees are bounded by $4N$. Second, we show that each nonempty language $L$ has a representative subset of at most three elements which behaves like $L$ w.r.t. the longest common prefix as well as w.r.t. longest common prefixes of $L$ after unions or concatenations with arbitrary other languages. From that, we conclude that the longest common prefix, and thus the longest common suffix, of a context-free language can be computed in polynomial time.

Formal Languages and Automata Theory

On the Relation between Context-Free Grammars and Parsing Expression Grammars

474 - Fabio Mascarenhas , Sergio Medeiros , Roberto Ierusalimschy 2013

Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs) have several similarities and a few differences in both their syntax and semantics, but they are usually presented through formalisms that hinder a proper comparison. In this paper we present a new formalism for CFGs that highlights the similarities and differences between them. The new formalism borrows from PEGs the use of parsing expressions and the recognition-based semantics. We show how one way of removing non-determinism from this formalism yields a formalism with the semantics of PEGs. We also prove, based on these new formalisms, how LL(1) grammars define the same language whether interpreted as CFGs or as PEGs, and also show how strong-LL(k), right-linear, and LL-regular grammars have simple language-preserving translations from CFGs to PEGs.

Formal Languages and Automata Theory

Pattern-Based Context-Free Grammars for Machine Translation

331 - Koichi Takeda 1996

This paper proposes the use of ``pattern-based context-free grammars as a basis for building machine translation (MT) systems, which are now being adopted as personal tools by a broad range of users in the cyberspace society. We discuss major requirements for such tools, including easy customization for diverse domains, the efficiency of the translation algorithm, and scalability (incremental improvement in translation quality through user interaction), and describe how our approach meets these requirements.

Computation and Language