Computational Linguistics
About

Combinatory Categorial Grammar

Combinatory Categorial Grammar builds syntax entirely from lexical categories and a small set of combinatory rules, achieving mildly context-sensitive power with a transparent syntax-semantics interface.

X/Y Y ⇒ X (forward application)

Combinatory Categorial Grammar (CCG), developed by Mark Steedman building on the categorial grammar tradition of Ajdukiewicz and Bar-Hillel, is a lexicalized grammar formalism in which virtually all syntactic information resides in the lexicon. Words are assigned categories that encode both their syntactic type and their combinatory potential — the types of arguments they seek and the direction in which they seek them. A small set of universal combinatory rules — application, composition, and type-raising — combines categories to build sentences. CCG achieves mildly context-sensitive generative power while maintaining a transparent mapping between syntactic derivation and semantic interpretation.

Categories and Application

CCG categories are either atomic (S for sentence, NP for noun phrase, N for noun) or complex, built with forward and backward slashes. A category X/Y seeks an argument of category Y to its right to produce a result of category X; a category X\Y seeks Y to its left. For example, an intransitive verb is S\NP (seeks an NP subject to its left), and a transitive verb is (S\NP)/NP (seeks an NP object to its right, then an NP subject to its left).

Core Combinatory Rules Forward application: X/Y Y ⇒ X (>)
Backward application: Y X\Y ⇒ X (<)
Forward composition: X/Y Y/Z ⇒ X/Z (>B)
Backward composition: Y\Z X\Y ⇒ X\Z (<B)
Type-raising: X ⇒ T/(T\X) (>T)

Composition and Flexible Constituency

The composition and type-raising rules give CCG its characteristic power and flexibility. Composition allows partial constituents to combine before their arguments are complete, enabling incremental processing and accounting for constructions like right-node raising ("John likes and Mary hates beans") where the shared object is combined with an incomplete VP. Type-raising promotes an argument to a function that seeks its predicate, enabling it to compose with other functors. These rules yield multiple derivations for the same string, but all derivations produce the same semantic interpretation — a property known as spurious ambiguity.

The Principle of Adjacency

CCG obeys a strict adjacency principle: all combinatory operations apply to adjacent constituents in the string. Long-distance dependencies such as wh-extraction and relativization are handled not by movement transformations but by the composition rules, which allow a gap to be propagated through the derivation. This analysis treats long-distance dependencies as a natural consequence of the combinatory apparatus rather than as a special mechanism, and it predicts the island constraints that limit extraction.

Parsing and NLP Applications

CCG parsing has been remarkably successful in NLP. The C&C parser, developed by Stephen Clark and James Curran, uses a supertagger to assign lexical categories and a chart parser for combinatory derivation, achieving state-of-the-art speed and accuracy. CCGbank, derived from the Penn Treebank by Julia Hockenmaier and Mark Steedman, provides the training data for statistical CCG parsers. More recently, neural CCG parsers have achieved extremely high accuracy, and the transparent syntax-semantics interface makes CCG derivations directly usable for semantic parsing, question answering, and natural language inference.

CCG's formal properties — mildly context-sensitive power equivalent to TAG and linear indexed grammars, polynomial-time parsability, and the semantic transparency of the Curry-Howard correspondence between derivations and lambda terms — make it one of the most theoretically elegant and practically effective grammar formalisms in computational linguistics.

Related Topics

References

  1. Steedman, M. (2000). The Syntactic Process. MIT Press.
  2. Steedman, M., & Baldridge, J. (2011). Combinatory categorial grammar. In R. Borsley & K. Börjars (Eds.), Non-Transformational Syntax (pp. 181–224). Blackwell. https://doi.org/10.1002/9781444395037.ch5
  3. Clark, S., & Curran, J. R. (2007). Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, 33(4), 493–552. https://doi.org/10.1162/coli.2007.33.4.493
  4. Hockenmaier, J., & Steedman, M. (2007). CCGbank: A corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguistics, 33(3), 355–396. https://doi.org/10.1162/coli.2007.33.3.355

External Links