In this paper we study the approximate minimization problem for language modelling. We assume we are given some language model as a black box. The objective is to obtain a weighted finite automaton (WFA) that fits within a given size constraint and which mimics the behaviour of the original model while minimizing some notion of distance between the black box and the extracted WFA. We provide an algorithm for the approximate minimization of black boxes trained for language modelling of sequential data over a one-letter alphabet. By reformulating the problem in terms of Hankel matrices, we leverage classical results on the approximation of Hankel operators, namely the celebrated Adamyan-Arov-Krein (AAK) theory. This allows us to use the spectral norm to measure the distance between the black box and the WFA. We provide theoretical guarantees to study the potentially infinite-rank Hankel matrix of the black box, without accessing the training data, and we prove that our method returns an asymptotically-optimal approximation.
A weight normalization procedure, commonly called pushing, is introduced for weighted tree automata (wta) over commutative semifields. The normalization preserves the recognized weighted tree language even for nondeterministic wta, but it is most useful for bottom-up deterministic wta, where it can be used for minimization and equivalence testing. In both applications a careful selection of the weights to be redistributed followed by normalization allows a reduction of the general problem to the corresponding problem for bottom-up deterministic unweighted tree automata. This approach was already successfully used by Mohri and Eisner for the minimization of deterministic weighted string automata. Moreover, the new equivalence test for two wta $M$ and $M$ runs in time $mathcal O((lvert M rvert + lvert Mrvert) cdot log {(lvert Qrvert + lvert Qrvert)})$, where $Q$ and $Q$ are the states of $M$ and $M$, respectively, which improves the previously best run-time $mathcal O(lvert M rvert cdot lvert Mrvert)$.
In this paper, by developing appropriate methods, we for the first time obtain characterization of four fundamental notions of detectability for general labeled weighted automata over monoids (denoted by $mathcal{A}^{mathfrak{M}}$ for short), where the four notions are strong (periodic) detectability (SD and SPD) and weak (periodic) detectability (WD and WPD). Firstly, we formulate the notions of concurrent composition, observer, and detector for $mathcal{A}^{mathfrak{M}}$. Secondly, we use the concurrent composition to give an equivalent condition for SD, use the detector to give an equivalent condition for SPD, and use the observer to give equivalent conditions for WD and WPD, all for general $mathcal{A}^{mathfrak{M}}$ without any assumption. Thirdly, we prove that for a labeled weighted automaton over monoid $(mathbb{Q}^k,+)$ (denoted by $mathcal{A}^{mathbb{Q}^k}$), its concurrent composition, observer, and detector can be computed in NP, $2$-EXPTIME, and $2$-EXPTIME, respectively, by developing novel connections between $mathcal{A}^{mathbb{Q}^k}$ and the NP-complete exact path length problem (proved by [Nyk{a}nen and Ukkonen, 2002]) and a subclass of Presburger arithmetic. As a result, we prove that for $mathcal{A}^{mathbb{Q}^k}$, SD can be verified in coNP, while SPD, WD, and WPD can be verified in $2$-EXPTIME. Particularly, for $mathcal{A}^{mathbb{Q}^k}$ in which from every state, a distinct state can be reached through some unobservable, instantaneous path, its detector can be computed in NP, and SPD can be verified in coNP. Finally, we prove that the problems of verifying SD and SPD of deterministic $mathcal{A}^{mathbb{N}}$ over monoid $(mathbb{N},+)$ are both NP-hard. The original methods developed in this paper will provide foundations for characterizing other fundamental properties (e.g., diagnosability and opacity) in $mathcal{A}^{mathfrak{M}}$.
We show that weighted automata over the field of two elements can be exponentially more compact than non-deterministic finite state automata. To show this, we combine ideas from automata theory and communication complexity. However, weighted automata are also efficiently learnable in Angluins minimal adequate teacher model in a number of queries that is polynomial in the size of the minimal weighted automaton.. We include an algorithm for learning WAs over any field based on a linear algebraic generalization of the Angluin-Schapire algorithm. Together, this produces a surprising result: weighted automata over fields are structured enough that even though they can be very compact, they are still efficiently learnable.
In this paper, we address the approximate minimization problem of Markov Chains (MCs) from a behavioral metric-based perspective. Specifically, given a finite MC and a positive integer k, we are looking for an MC with at most k states having minimal distance to the original. The metric considered in this work is the bisimilarity distance of Desharnais et al.. For this metric we show that (1) optimal approximations always exist; (2) the problem has a bilinear program characterization; and (3) prove that its threshold problem is in PSPACE and NP-hard. In addition to the bilinear program solution, we present an approach inspired by expectation maximization techniques for computing suboptimal solutions to the problem. Experiments suggest that our method gives a practical approach that outperforms the bilinear program implementation run on state-of-the-art bilinear solvers.