Weighted finite automata (WFAs) extend classical finite-state automata by associating a weight — drawn from a semiring — with each transition, initial state, and final state. Rather than simply accepting or rejecting a string, a weighted automaton assigns a weight to each string, computed by combining transition weights along accepting paths. By choosing different semirings — the real semiring for probabilities, the tropical semiring for shortest paths, the Boolean semiring for standard acceptance — weighted automata provide a unified framework for diverse computational tasks in speech recognition, machine translation, and statistical NLP.
Semirings and Weight Computation
A semiring (K, ⊕, ⊗, 0̄, 1̄) consists of a set K with two operations: ⊕ (addition, for combining paths) and ⊗ (multiplication, for combining weights along a path). The weight assigned to a string w is computed by ⊗-multiplying the weights along each accepting path for w, then ⊕-summing over all accepting paths. This abstraction encompasses a wide range of computational tasks.
Probability: (ℝ≥0, +, ×, 0, 1) — total probability of paths
Tropical: (ℝ∪{∞}, min, +, ∞, 0) — shortest/best path
Log: (ℝ∪{∞}, ⊕_log, +, ∞, 0) — numerically stable probabilities
Algorithms and Operations
The power of weighted automata lies in the suite of generic algorithms that operate over any semiring. Weighted determinization converts a nondeterministic WFA into an equivalent deterministic one (when the semiring satisfies certain conditions). The shortest-path algorithm finds the highest-scoring string. Weighted intersection and composition enable the combination of multiple knowledge sources — for example, composing an acoustic model transducer with a language model automaton in speech recognition.
The OpenFST library, developed by Mehryar Mohri and colleagues, provides efficient C++ implementations of weighted finite-state transducer (WFST) operations. WFST-based decoders compose acoustic models, pronunciation lexicons, and language models into a single search network for speech recognition. This WFST framework, pioneered at AT&T Labs and later at Google, became the standard architecture for large-vocabulary speech recognition systems before the advent of end-to-end neural approaches.
Applications in NLP
In statistical NLP, weighted automata appear everywhere. N-gram language models are naturally represented as weighted automata over the tropical or probability semiring. Lattices and confusion networks in speech recognition and machine translation are weighted acyclic automata. The forward-backward algorithm for hidden Markov models and the inside-outside algorithm for probabilistic context-free grammars are instances of generic semiring computations over weighted structures.
Recent research has connected weighted automata to neural sequence models, showing that recurrent neural networks with linear activation functions are equivalent to weighted automata over the real-number semiring. This connection provides tools for analyzing the expressiveness and limitations of neural language models through the well-developed theory of weighted automata, bridging the gap between classical formal language theory and modern deep learning.