Weighted Automata

Weighted finite automata (WFAs) extend classical finite-state automata by associating a weight — drawn from a semiring — with each transition, initial state, and final state. Rather than simply accepting or rejecting a string, a weighted automaton assigns a weight to each string, computed by combining transition weights along accepting paths. By choosing different semirings — the real semiring for probabilities, the tropical semiring for shortest paths, the Boolean semiring for standard acceptance — weighted automata provide a unified framework for diverse computational tasks in speech recognition, machine translation, and statistical NLP.

Semirings and Weight Computation

A semiring (K, ⊕, ⊗, 0̄, 1̄) consists of a set K with two operations: ⊕ (addition, for combining paths) and ⊗ (multiplication, for combining weights along a path). The weight assigned to a string w is computed by ⊗-multiplying the weights along each accepting path for w, then ⊕-summing over all accepting paths. This abstraction encompasses a wide range of computational tasks.

Common Semirings in NLP Boolean: ({0,1}, ∨, ∧, 0, 1) — standard acceptance
Probability: (ℝ≥0, +, ×, 0, 1) — total probability of paths
Tropical: (ℝ∪{∞}, min, +, ∞, 0) — shortest/best path
Log: (ℝ∪{∞}, ⊕_log, +, ∞, 0) — numerically stable probabilities

Algorithms and Operations

The power of weighted automata lies in the suite of generic algorithms that operate over any semiring. Weighted determinization converts a nondeterministic WFA into an equivalent deterministic one (when the semiring satisfies certain conditions). The shortest-path algorithm finds the highest-scoring string. Weighted intersection and composition enable the combination of multiple knowledge sources — for example, composing an acoustic model transducer with a language model automaton in speech recognition.

OpenFST and WFST Decoders

The OpenFST library, developed by Mehryar Mohri and colleagues, provides efficient C++ implementations of weighted finite-state transducer (WFST) operations. WFST-based decoders compose acoustic models, pronunciation lexicons, and language models into a single search network for speech recognition. This WFST framework, pioneered at AT&T Labs and later at Google, became the standard architecture for large-vocabulary speech recognition systems before the advent of end-to-end neural approaches.

Applications in NLP

In statistical NLP, weighted automata appear everywhere. N-gram language models are naturally represented as weighted automata over the tropical or probability semiring. Lattices and confusion networks in speech recognition and machine translation are weighted acyclic automata. The forward-backward algorithm for hidden Markov models and the inside-outside algorithm for probabilistic context-free grammars are instances of generic semiring computations over weighted structures.

Recent research has connected weighted automata to neural sequence models, showing that recurrent neural networks with linear activation functions are equivalent to weighted automata over the real-number semiring. This connection provides tools for analyzing the expressiveness and limitations of neural language models through the well-developed theory of weighted automata, bridging the gap between classical formal language theory and modern deep learning.

Semirings and Weight Computation

Algorithms and Operations

Applications in NLP

References

External Links

Semirings and Weight Computation

Algorithms and Operations

Applications in NLP

Related Topics

References

External Links