The Hidden Markov Model (HMM) was one of the first successful statistical approaches to POS tagging. In this framework, the tag sequence is modeled as a first-order Markov chain (each tag depends only on the previous tag), and each tag generates an observed word according to an emission distribution. The tagging problem becomes finding the most probable hidden state sequence (tag sequence) given the observed sequence (word sequence), which is efficiently solved by the Viterbi algorithm.
HMM Formulation
≈ argmaxt1..tn ∏i=1n P(wi | ti) · P(ti | ti−1)
Transition probabilities: P(ti | ti−1) = count(ti−1, ti) / count(ti−1)
Emission probabilities: P(wi | ti) = count(ti, wi) / count(ti)
The model parameters are the transition probabilities (probability of each tag given the previous tag) and emission probabilities (probability of each word given its tag). These are estimated from a tagged corpus using maximum likelihood estimation (simple counting and normalization). The Baum-Welch algorithm (a special case of EM) can estimate parameters from untagged text, though supervised estimation from tagged corpora produces much better results.
Viterbi Decoding
The Viterbi algorithm finds the most probable tag sequence in O(nT²) time, where n is the sentence length and T is the number of tags. It uses dynamic programming, maintaining for each position and tag the probability of the best tag sequence ending with that tag. Back-pointers allow the optimal sequence to be recovered. The forward algorithm computes the total probability of the observed sequence (summing over all possible tag sequences), which is useful for training and evaluation.
Limitations and Legacy
HMM taggers achieve approximately 95-96% accuracy on the Penn Treebank, which was state of the art in the early 1990s but has since been surpassed by discriminative models (MEMMs, CRFs) and neural taggers. The main limitations of HMMs are the strong independence assumptions (each word depends only on its tag, not on neighboring words or tags) and the generative nature of the model, which prevents the use of arbitrary overlapping features. Despite these limitations, HMM taggers remain an important pedagogical tool and a building block for understanding more advanced sequence models.