Computational Linguistics
About

Recursively Enumerable Languages

Recursively enumerable languages sit at the top of the Chomsky hierarchy, corresponding to unrestricted grammars and Turing machines, and defining the absolute boundary of what can be computed.

L(M) = { w ∈ Σ* | M halts and accepts w }

Recursively enumerable (RE) languages, also called Type 0 languages, form the most inclusive class in the Chomsky hierarchy. A language is recursively enumerable if there exists a Turing machine that accepts every string in the language — though it may loop forever on strings not in the language. RE languages are generated by unrestricted grammars, where production rules have no constraints on form. This class delineates the ultimate boundary of computability: any language that is not recursively enumerable cannot be recognized by any computational device.

Unrestricted Grammars and Turing Machines

An unrestricted grammar (Type 0) permits production rules of the form α → β where α contains at least one non-terminal, and β is any string of terminals and non-terminals, including the empty string. Unlike context-sensitive grammars, unrestricted grammars allow rules that shorten the sentential form. The equivalence between unrestricted grammars and Turing machines is a fundamental theorem: a language is generated by some unrestricted grammar if and only if it is accepted by some Turing machine.

Recursive vs. Recursively Enumerable Recursive (decidable): M halts on all inputs, accepting or rejecting
Recursively enumerable (semi-decidable): M halts and accepts if w ∈ L,
but may loop forever if w ∉ L
L is recursive ⟺ both L and its complement L̄ are RE

Undecidability

The distinction between recursive and recursively enumerable languages gives rise to the fundamental concept of undecidability. The halting problem — whether a given Turing machine halts on a given input — is the canonical example of a problem that is recursively enumerable but not recursive. Alan Turing's 1936 proof that no general algorithm can solve the halting problem established the first absolute limitation on computation. Many problems in formal language theory inherit this undecidability: the equivalence problem for context-free grammars, the emptiness problem for context-sensitive grammars, and the ambiguity problem for CFGs are all undecidable.

Rice's Theorem

Rice's theorem generalizes undecidability by showing that every non-trivial semantic property of Turing machines is undecidable. This means that no algorithm can determine whether the language of a given Turing machine has any particular non-trivial property — whether it is empty, finite, regular, context-free, or equal to any other specified language.

Relevance to Computational Linguistics

While natural languages are almost certainly not recursively enumerable in any meaningful sense (they are finite collections for any fixed set of speakers), the theory of RE languages is relevant to computational linguistics through the lens of computability constraints. The undecidability of various grammar problems — such as determining whether two grammars generate the same language, or whether a grammar is ambiguous — directly impacts the design of grammar engineering tools and parsing systems. These impossibility results tell us what cannot be automated and where human judgment is irreplaceable.

The concept of Turing-completeness also arises in computational linguistics when evaluating the expressiveness of grammar formalisms and programming languages used for NLP. Understanding that certain computations are fundamentally impossible, regardless of available resources, is essential for setting realistic expectations about what linguistic analysis tools can achieve.

Related Topics

References

  1. Turing, A. M. (1936). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, s2-42(1), 230–265. https://doi.org/10.1112/plms/s2-42.1.230
  2. Rice, H. G. (1953). Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 74(2), 358–366. https://doi.org/10.1090/S0002-9947-1953-0053041-6
  3. Sipser, M. (2012). Introduction to the Theory of Computation (3rd ed.). Cengage Learning.
  4. Hopcroft, J. E., Motwani, R., & Ullman, J. D. (2006). Introduction to Automata Theory, Languages, and Computation (3rd ed.). Addison-Wesley.

External Links