Gold's theorem, proved by E. Mark Gold in his seminal 1967 paper "Language Identification in the Limit," establishes that any class of languages containing all finite languages and at least one infinite language cannot be identified in the limit from positive examples alone. This result applies to all four levels of the Chomsky hierarchy — the classes of regular, context-free, context-sensitive, and recursively enumerable languages are all unlearnable from positive data in Gold's framework. The theorem has had enormous influence on debates about language acquisition, grammar induction, and the necessity of innate linguistic knowledge.
Identification in the Limit
In Gold's model, a learner receives an infinite sequence of strings from the target language (a text, or positive presentation) and after each example outputs a hypothesis grammar. The learner identifies the language in the limit if, after some finite point, it always outputs the correct grammar. A class of languages is identifiable in the limit from text if there exists a single learner that succeeds for every language in the class and every text for that language.
(1) contains all finite languages over Σ, and
(2) contains at least one infinite language L∞.
Then C is not identifiable in the limit from positive data alone.
Proof sketch: Any finite sample is consistent with some finite language,
so the learner can never be certain that the target is L∞.
Implications for Language Acquisition
Gold's theorem was immediately seized upon by proponents of nativist theories of language acquisition. If children learn language primarily from positive examples (utterances they hear), and if the class of natural languages is superfinite (containing all finite languages and some infinite ones), then language learning from experience alone is impossible — some innate bias or constraint must restrict the hypothesis space. This argument became a cornerstone of the poverty-of-the-stimulus debate, supporting Chomsky's claim that Universal Grammar provides the necessary innate constraints.
Gold's model has been criticized on several grounds. Children do receive some negative evidence (corrections, reformulations), the model assumes no distributional information (only set membership), and it requires exact identification rather than approximate learning. Stochastic approaches, Bayesian models, and the PAC learning framework all provide alternative formalizations that yield more optimistic learnability results. The debate highlights how formal results depend critically on the assumptions built into the learning model.
Positive Results and Restricted Classes
While Gold's negative result applies to the full Chomsky hierarchy classes, restricted subclasses can be learned from positive data. Dana Angluin showed that the class of k-reversible languages is identifiable in the limit from text. Shinohara and others proved that pattern languages are learnable from positive data. The field of grammatical inference has developed a rich taxonomy of learnable subclasses, often characterized by structural restrictions that limit the hypothesis space sufficiently for convergence.
Gold's theorem remains a foundational reference point in computational linguistics. It reminds us that learning language from examples is a fundamentally constrained problem, and that the success of both human language acquisition and machine learning algorithms depends on inductive biases — whether these biases come from Universal Grammar, distributional statistics, or the architecture of a neural network. Understanding these biases, and their formal consequences for learnability, remains one of the deepest questions in the field.