Computational Linguistics
About

PAC Learning

Probably Approximately Correct learning theory provides a rigorous mathematical framework for analyzing when a learning algorithm can reliably generalize from finite training data, with direct applications to grammar induction and NLP.

P(error(h) ≤ ε) ≥ 1 − δ

Probably Approximately Correct (PAC) learning theory, introduced by Leslie Valiant in 1984, provides the foundational mathematical framework for analyzing the sample complexity and computational feasibility of learning from examples. In the PAC model, a learner receives random examples drawn from an unknown distribution, labeled according to an unknown target concept from a known concept class. The learner must output a hypothesis that, with high probability (at least 1 − δ), has low error (at most ε) with respect to the underlying distribution. PAC learning theory has profoundly influenced computational linguistics by providing tools to analyze when grammatical structures can be reliably inferred from linguistic data.

The PAC Model

Formally, a concept class C over instance space X is PAC-learnable if there exists an algorithm A and a polynomial p(·, ·, ·, ·) such that for every concept c ∈ C, every distribution D over X, and every ε, δ > 0, the algorithm A, given access to examples drawn from D and labeled by c, runs in time p(1/ε, 1/δ, n, size(c)) and outputs a hypothesis h with P_D(h(x) ≠ c(x)) ≤ ε with probability at least 1 − δ.

Sample Complexity Bound For a finite concept class C:
m ≥ (1/ε)(ln|C| + ln(1/δ))
samples suffice for PAC learning.
For infinite classes, |C| is replaced by the VC dimension d:
m ≥ (1/ε)(d·ln(1/ε) + ln(1/δ))

VC Dimension and Language Learning

The Vapnik-Chervonenkis (VC) dimension measures the effective complexity of a hypothesis class and determines the sample complexity of PAC learning. A concept class with VC dimension d requires Θ(d/ε) samples for PAC learning. In computational linguistics, VC dimension analysis has been applied to classes of formal grammars: for example, the class of k-reversible automata has finite VC dimension and is therefore PAC-learnable, while the class of all regular languages has infinite VC dimension in certain formulations, suggesting inherent difficulty.

PAC Learning and Grammar Induction

The PAC framework has been applied to study the learnability of various grammar classes. While Gold's theorem shows that many grammar classes cannot be learned from positive examples alone in the limit, PAC learning relaxes the requirements: approximate learning from random examples is often possible. Results by Angluin, Pitt, and others show that the class of regular languages is not efficiently PAC-learnable (under cryptographic assumptions), but restricted subclasses — such as k-reversible languages and deterministic finite automata of bounded size — are learnable in polynomial time.

Extensions and Impact

The basic PAC model has been extended in numerous ways relevant to language learning. The model of learning with membership queries (where the learner can ask whether a specific string belongs to the language) dramatically increases learnability — Angluin's L* algorithm can efficiently learn any regular language with membership and equivalence queries. The mistake-bound model, agnostic PAC learning (which does not assume the target is in the hypothesis class), and online learning models have all been applied to NLP tasks including text classification, sequence labeling, and parsing.

PAC learning theory continues to provide the conceptual tools for understanding generalization in NLP. When a neural parser trained on a treebank achieves high accuracy on held-out data, PAC theory explains why: the effective complexity of the hypothesis class (controlled by the network architecture and regularization) is balanced against the amount of training data. Understanding this balance is essential for designing data-efficient NLP systems, particularly for low-resource languages.

Related Topics

References

  1. Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142. https://doi.org/10.1145/1968.1972
  2. Kearns, M. J., & Vazirani, U. V. (1994). An Introduction to Computational Learning Theory. MIT Press.
  3. Angluin, D. (1988). Queries and concept learning. Machine Learning, 2(4), 319–342. https://doi.org/10.1007/BF00116828
  4. de la Higuera, C. (2010). Grammatical Inference: Learning Automata and Grammars. Cambridge University Press. https://doi.org/10.1017/CBO9781139194655

External Links