Part-of-speech (POS) tagging is the task of assigning a grammatical category label to each word (or token) in a sentence. These labels indicate both the broad syntactic class (noun, verb, adjective, adverb, etc.) and morphosyntactic features (tense, number, case). POS tagging is one of the oldest and most fundamental tasks in NLP, serving as a prerequisite for nearly all downstream syntactic and semantic analysis. The task is challenging because many words are ambiguous: "bank" can be a noun or verb, "that" can be a determiner, pronoun, or complementizer.
Tagsets
NN (noun, singular), NNS (noun, plural), NNP (proper noun)
VB (verb, base), VBD (past tense), VBG (gerund), VBN (past participle)
JJ (adjective), RB (adverb), DT (determiner), IN (preposition)
Universal POS (UPOS): 17 tags
NOUN, VERB, ADJ, ADV, ADP, DET, PRON, NUM, ...
Most frequent baseline: ~90% accuracy
State-of-the-art: ~97.5% accuracy (PTB)
The two most widely used tagsets are the Penn Treebank tagset (45 tags for English) and the Universal POS tagset (17 tags across all languages). The PTB tagset makes finer distinctions (e.g., six verb forms) that are important for English syntax, while the UPOS tagset prioritizes cross-linguistic consistency. A simple baseline that assigns each word its most frequent tag achieves about 90% accuracy; the remaining 10% consists of genuinely ambiguous cases requiring context.
Methods
POS tagging methods have evolved from rule-based systems (using hand-written disambiguation rules) through statistical models (HMMs, MEMMs, CRFs) to neural approaches. HMM taggers use the Viterbi algorithm to find the most likely tag sequence given the observed words. CRF taggers model the conditional probability of the tag sequence directly, avoiding the independence assumptions of HMMs. Modern taggers use BiLSTM or Transformer encoders, often as part of multi-task models that jointly predict POS tags, morphological features, and syntactic structure.
Role in NLP Pipelines
POS tags serve as features for nearly every higher-level NLP task. Parsers use POS tags to constrain the space of possible syntactic analyses. NER systems use POS patterns to identify entity boundaries. Information retrieval systems use POS tags to weight content words more heavily. Even in the era of end-to-end neural models, POS tagging remains relevant as an auxiliary training objective that provides useful inductive bias, and as an interpretability tool for understanding model behavior.