We introduce functorial language models: a principled way to compute probability distributions over word sequences given a monoidal functor from grammar to meaning. This yields a method for training categorical compositional distributional (DisCoCat) models on raw text data. We provide a proof-of-concept implementation in DisCoPy, the Python toolbox for monoidal categories.
We present some categorical investigations into Wittgensteins language-games, with applications to game-theoretic pragmatics and question-answering in natural language processing.
In categorical compositional semantics of natural language one studies functors from a category of grammatical derivations (such as a Lambek pregroup) to a semantic category (such as real vector spaces). We compositionally build game-theoretic semantics of sentences by taking the semantic category to be the category whose morphisms are open games. This requires some modifications to the grammar category to compensate for the failure of open games to form a compact closed category. We illustrate the theory using simple examples of Wittgensteins language-games.
Distributional compositional (DisCo) models are functors that compute the meaning of a sentence from the meaning of its words. We show that DisCo models in the category of sets and relations correspond precisely to relational databases. As a consequence, we get complexity-theoretic reductions from semantics and entailment of a fragment of natural language to evaluation and containment of conjunctive queries, respectively. Finally, we define question answering as an NP-complete problem.
We propose a general class of language models that treat reference as an explicit stochastic latent variable. This architecture allows models to create mentions of entities and their attributes by accessing external databases (required by, e.g., dialogue generation and recipe generation) and internal state (required by, e.g. language models which are aware of coreference). This facilitates the incorporation of information that can be accessed in predictable locations in databases or discourse context, even when the targets of the reference may be rare words. Experiments on three tasks shows our model variants based on deterministic attention.
In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seamlessly incorporate linguistic intuitions - including polysemy and existence of multi-word lexical items - into our language model. Experiments on multiple language modeling tasks show that English neural lattice language models that utilize polysemous embeddings are able to improve perplexity by 9.95% relative to a word-level baseline, and that a Chinese model that handles multi-character tokens is able to improve perplexity by 20.94% relative to a character-level baseline.