Computational Linguistics
About

Tree-Adjoining Grammar

Tree-adjoining grammars extend context-free power with an adjunction operation on trees, defining the class of mildly context-sensitive languages that closely matches the complexity of natural language syntax.

TAG = (Σ, N, I, A, S) with substitution and adjunction

Tree-adjoining grammar (TAG), introduced by Aravind Joshi in 1975 and developed extensively through the 1980s, is a tree-rewriting formalism that generates trees rather than strings. TAG extends the generative capacity of context-free grammars by introducing an adjunction operation that inserts one tree into the interior of another. This additional operation gives TAG the power to generate some context-sensitive languages — specifically the class of mildly context-sensitive languages — while maintaining polynomial-time parsability. TAG has been enormously influential in computational linguistics, providing a mathematically precise formalism that closely matches the syntactic complexity of natural language.

Elementary Trees and Operations

A TAG consists of two finite sets of elementary trees: initial trees (I) and auxiliary trees (A). Initial trees are rooted in the start symbol and have terminal and non-terminal leaves — they represent basic phrase structures. Auxiliary trees have a distinguished foot node on their frontier, labeled with the same non-terminal as the root — they represent recursive or modifier structures. Two operations combine elementary trees: substitution replaces a non-terminal leaf in one tree with an initial tree, and adjunction inserts an auxiliary tree at an interior node of another tree.

Adjunction Operation Given tree α with interior node n labeled A,
and auxiliary tree β with root and foot labeled A:
Adjoin β at n: detach the subtree at n, replace it with β,
and reattach the detached subtree at the foot of β.
Result: β wraps around the original subtree at n.

Mild Context-Sensitivity

TAG generates the class of tree-adjoining languages (TALs), which properly contains the context-free languages. TAGs can generate the copy language {ww}, the cross-serial dependency language {a^n b^m c^n d^m}, and similar patterns that arise in natural language but are beyond context-free power. However, TAGs cannot generate all context-sensitive languages — they are constrained to constant growth, meaning that the length of generated strings grows linearly with derivation depth. Joshi characterized this intermediate class as mildly context-sensitive, arguing that it precisely captures the generative capacity needed for natural language.

Lexicalized TAG (LTAG)

In lexicalized TAG, every elementary tree is anchored by at least one lexical item (terminal symbol). This means that every piece of the grammar is associated with a word, making the formalism inherently lexicalized. LTAG provides an extended domain of locality: each elementary tree encapsulates the entire argument structure of its anchor word, including long-distance dependencies. The XTAG project at the University of Pennsylvania developed a large-scale LTAG grammar of English that demonstrated the linguistic descriptive adequacy of the formalism.

Parsing and Applications

TAG parsing can be performed in O(n⁶) time using the Earley-style algorithm of Schabes and Joshi (1988) or CYK-style dynamic programming. While this is more expensive than O(n³) CFG parsing, it remains polynomial and is practical for moderate-length sentences. Supertagging — the use of a sequential labeling model to assign elementary trees to words before full parsing — dramatically reduces the effective search space and has made TAG parsing competitive with CFG-based approaches in practice.

TAG and its variants (multi-component TAG, synchronous TAG) have been applied to a wide range of linguistic phenomena including wh-movement, relative clauses, topicalization, scrambling in free word-order languages, and machine translation. The formalism's combination of extended locality, mild context-sensitivity, and polynomial parsability makes it one of the most linguistically and computationally satisfying grammar frameworks available.

Related Topics

References

  1. Joshi, A. K., Levy, L. S., & Takahashi, M. (1975). Tree adjunct grammars. Journal of Computer and System Sciences, 10(1), 136–163. https://doi.org/10.1016/S0022-0000(75)80019-5
  2. Joshi, A. K. (1985). Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions? In D. Dowty, L. Karttunen, & A. Zwicky (Eds.), Natural Language Parsing (pp. 206–250). Cambridge University Press.
  3. Schabes, Y., & Joshi, A. K. (1988). An Earley-type parsing algorithm for tree adjoining grammars. In Proceedings of the 26th Annual Meeting of the ACL (pp. 258–269). https://doi.org/10.3115/982023.982055
  4. XTAG Research Group. (2001). A Lexicalized Tree Adjoining Grammar for English. Technical Report IRCS-01-03, University of Pennsylvania.

External Links