ترغب بنشر مسار تعليمي؟ اضغط هنا

An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities

44   0   0.0 ( 0 )
 نشر من قبل Andreas Stolcke
 تاريخ النشر 1994
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Andreas Stolcke




اسأل ChatGPT حول البحث

We describe an extension of Earleys parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (Viterbi) parse of the string; d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earleys top-down control structure. It can process any context-free rule format without conversion to some normal form, and combines computations for (a) through (d) in a single algorithm. Finally, the algorithm has simple extensions for processing partially bracketed inputs, and for finding partial parses and their likelihoods on ungrammatical inputs.


قيم البحث

اقرأ أيضاً

Valiant (1975) has developed an algorithm for recognition of context free languages. As of today, it remains the algorithm with the best asymptotic complexity for this purpose. In this paper, we present an algebraic specification, implementation, and proof of correctness of a generalisation of Valiants algorithm. The generalisation can be used for recognition, parsing or generic calculation of the transitive closure of upper triangular matrices. The proof is certified by the Agda proof assistant. The certification is representative of state-of-the-art methods for specification and proofs in proof assistants based on type-theory. As such, this paper can be read as a tutorial for the Agda system.
270 - Marc Durand 2016
The Cellular Potts Model (CPM) is a lattice based modeling technique which is widely used for simulating cellular patterns such as foams or biological tissues. Despite its realism and generality, the standard Monte Carlo algorithm used in the scienti fic literature to evolve this model preserves connectivity of cells on a limited range of simulation temperature only. We present a new algorithm in which cell fragmentation is forbidden for all simulation temperatures. This allows to significantly enhance realism of the simulated patterns. It also increases the computational efficiency compared with the standard CPM algorithm even at same simulation temperature, thanks to the time spared in not doing unrealistic moves. Moreover, our algorithm restores the detailed balance equation, ensuring that the long-term stage is independent of the chosen acceptance rate and chosen path in the temperature space.
We present two structural results concerning longest common prefixes of non-empty languages. First, we show that the longest common prefix of the language generated by a context-free grammar of size $N$ equals the longest common prefix of the same gr ammar where the heights of the derivation trees are bounded by $4N$. Second, we show that each nonempty language $L$ has a representative subset of at most three elements which behaves like $L$ w.r.t. the longest common prefix as well as w.r.t. longest common prefixes of $L$ after unions or concatenations with arbitrary other languages. From that, we conclude that the longest common prefix, and thus the longest common suffix, of a context-free language can be computed in polynomial time.
Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs) have several similarities and a few differences in both their syntax and semantics, but they are usually presented through formalisms that hinder a proper comparison. In this paper w e present a new formalism for CFGs that highlights the similarities and differences between them. The new formalism borrows from PEGs the use of parsing expressions and the recognition-based semantics. We show how one way of removing non-determinism from this formalism yields a formalism with the semantics of PEGs. We also prove, based on these new formalisms, how LL(1) grammars define the same language whether interpreted as CFGs or as PEGs, and also show how strong-LL(k), right-linear, and LL-regular grammars have simple language-preserving translations from CFGs to PEGs.
331 - Koichi Takeda 1996
This paper proposes the use of ``pattern-based context-free grammars as a basis for building machine translation (MT) systems, which are now being adopted as personal tools by a broad range of users in the cyberspace society. We discuss major require ments for such tools, including easy customization for diverse domains, the efficiency of the translation algorithm, and scalability (incremental improvement in translation quality through user interaction), and describe how our approach meets these requirements.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا