Computational Linguistics
About

Supertagging

Supertagging assigns rich lexical categories (supertags) to words that encode detailed syntactic subcategorization and combinatory information, functioning as 'almost parsing' by heavily constraining the space of possible derivations.

P(s_i | w, context) where s_i ∈ {S\NP, (S\NP)/NP, NP, ...} (CCG supertags)

Supertagging, a term coined by Bangalore and Joshi (1999), is the task of assigning lexicalized grammar categories (supertags) to words. Unlike simple POS tags, supertags encode rich syntactic information including subcategorization frames, argument structure, and combinatory potential. In Combinatory Categorial Grammar (CCG), a supertag like (S\NP)/NP assigned to a transitive verb specifies that it takes an NP object to the right and an NP subject to the left to form a sentence. By assigning supertags, a supertagging model does much of the work of parsing, earning it the description "almost parsing."

Supertag Categories

CCG Supertag Examples NP: noun phrase (e.g., "John")
S\NP: intransitive verb (e.g., "sleeps")
(S\NP)/NP: transitive verb (e.g., "likes")
((S\NP)/PP)/NP: ditransitive verb with PP (e.g., "put")
N/N: prenominal modifier (e.g., "big")
(NP\NP)/NP: preposition (e.g., "of")

CCGbank contains ~1,200 distinct supertag types
(vs. 45 POS tags in the Penn Treebank)

The key challenge in supertagging is the very large label set: CCGbank contains over 1,200 distinct supertag categories, compared to just 45 POS tags. However, the distribution is heavily skewed, with the most frequent 400 categories covering over 99% of tokens. The richness of the label set means that once supertags are assigned, the parsing problem is highly constrained, often reducing to a nearly deterministic combination of adjacent categories.

Methods and Integration with Parsing

Early supertaggers used HMMs and MaxEnt models. Clark and Curran (2007) developed a log-linear supertagger integrated with a CCG parser, where the supertagger provides a pruned set of candidate categories and the parser selects among them. Modern neural supertaggers using BiLSTMs and Transformers achieve over 96% accuracy on CCGbank. Lewis et al. (2016) showed that a BiLSTM supertagger combined with simple A* parsing could match the accuracy of much more complex systems.

Supertagging for Other Formalisms
While most closely associated with CCG, supertagging has been applied to other lexicalized grammar formalisms including Tree-Adjoining Grammar (TAG), Head-Driven Phrase Structure Grammar (HPSG), and Lexical Functional Grammar (LFG). In each case, the supertags encode the lexical entry's syntactic properties, constraining the space of possible parses.

Supertagging as Feature Extraction

Beyond its role in grammar-driven parsing, supertagging provides linguistically rich features for downstream tasks. Supertag sequences encode predicate-argument structure and long-distance dependencies that are not captured by POS tags alone. They have been used as features for semantic role labeling, machine translation, and text generation. The success of supertagging demonstrates that much of syntactic structure can be determined locally at the word level, with only lightweight global parsing needed to resolve remaining ambiguities.

Related Topics

References

  1. Bangalore, S., & Joshi, A. K. (1999). Supertagging: An approach to almost parsing. Computational Linguistics, 25(2), 237–265. https://doi.org/10.5555/973306.973310
  2. Clark, S., & Curran, J. R. (2007). Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, 33(4), 493–552. https://doi.org/10.1162/coli.2007.33.4.493
  3. Lewis, M., Lee, K., & Zettlemoyer, L. (2016). LSTM CCG parsing. Proceedings of NAACL-HLT 2016, 221–231. https://doi.org/10.18653/v1/N16-1026
  4. Vaswani, A., Bisk, Y., Sagae, K., & Malioutov, R. (2016). Supertagging with LSTMs. Proceedings of NAACL-HLT 2016, 232–237. https://doi.org/10.18653/v1/N16-1027

External Links