Computational Linguistics
About

Grammar Induction

Grammar induction is the task of automatically learning a grammar (context-free or dependency) from data, either from raw text (unsupervised) or from partially annotated corpora (semi-supervised).

θ* = argmax_θ P(w_1...w_n | θ) (MLE for grammar parameters from unannotated text)

Grammar induction, also known as grammar learning or grammatical inference, is the problem of automatically inferring a grammar from data without explicit syntactic annotation. In its strongest form (unsupervised grammar induction), the learner receives only raw sentences and must discover the syntactic rules governing the language. In semi-supervised settings, limited annotation or linguistic knowledge guides the learning process. Grammar induction is both a practical goal (reducing the need for expensive treebank annotation) and a scientific question (understanding how children acquire syntactic knowledge).

Approaches

Inside-Outside Algorithm for PCFGs E-step: compute expected counts using inside (β) and outside (α) probabilities
β(A, i, j) = P(wi...wj | A) (inside probability)
α(A, i, j) = P(w1...wi−1, A, wj+1...wn) (outside probability)

M-step: re-estimate rule probabilities
q(A → BC) = E[count(A → BC)] / E[count(A)]

Iterate until convergence

The classical approach to unsupervised PCFG induction uses the inside-outside algorithm (Baker, 1979), an instance of EM applied to PCFGs. Starting from a random or uniform initialization, the algorithm iterates between computing expected rule counts (E-step, using inside and outside probabilities) and re-estimating rule probabilities (M-step). However, this approach is known to converge to poor local optima and does not recover linguistically meaningful grammars from raw text without additional inductive biases.

Inductive Biases and Constraints

Successful grammar induction requires strong inductive biases. Important biases include: structural bias (preferring shorter derivations or specific branching patterns via Bayesian priors), distributional bias (words in similar contexts should have similar syntactic behavior), and typological bias (exploiting universal tendencies such as head-directionality consistency). The Dependency Model with Valence (DMV) of Klein and Manning (2004) achieved the first successful unsupervised dependency induction for English by modeling whether a head has generated any dependents in each direction.

Connection to Language Acquisition
Grammar induction relates directly to the poverty of the stimulus argument in linguistics: children learn syntactic rules from limited positive evidence without explicit correction. Computational models of grammar induction help clarify what inductive biases are necessary and sufficient for successful language acquisition, contributing to debates about the innateness of linguistic knowledge.

Neural Grammar Induction

Recent work applies neural networks to grammar induction. Neural PCFG models (Kim et al., 2019) parameterize rule probabilities using neural networks, enabling better generalization. Compound PCFGs add a global sentence-level latent variable that conditions all rule choices, capturing inter-rule correlations that standard PCFGs miss. These models achieve the best unsupervised constituency parsing results, with F1 scores around 55-65% on the Penn Treebank, still far below supervised parsers but a significant advance over classical EM-based approaches.

Related Topics

References

  1. Klein, D., & Manning, C. D. (2004). Corpus-based induction of syntactic structure: Models of dependency and constituency. Proceedings of ACL 2004, 478–485. https://doi.org/10.3115/1218955.1219016
  2. Kim, Y., Dyer, C., & Rush, A. M. (2019). Compound probabilistic context-free grammars for grammar induction. Proceedings of ACL 2019, 2369–2385. https://doi.org/10.18653/v1/P19-1228
  3. Baker, J. K. (1979). Trainable grammars for speech recognition. Proceedings of the Spring Conference of the Acoustical Society of America, 547–550.
  4. Clark, A. (2001). Unsupervised induction of stochastic context-free grammars using distributional clustering. Proceedings of CoNLL 2001, 105–112. https://doi.org/10.3115/1117822.1117840

External Links