Word sense disambiguation (WSD) is the task of automatically determining the correct sense of a word in context, given a pre-defined sense inventory. Because most content words in natural language are polysemous -- "bank" can mean a financial institution, a river bank, or a pool shot -- WSD is necessary for accurate machine translation, information retrieval, and semantic analysis. WSD has been called an "AI-complete" problem by some researchers because it seems to require broad world knowledge and contextual reasoning, though practical systems achieve strong performance on standard benchmarks.
Approaches to WSD
Knowledge-based (Lesk algorithm):
s* = argmax_{s ∈ senses(w)} |gloss(s) ∩ context|
Graph-based (PageRank on sense graph):
PR(s_i) = (1 − d) + d · Σ_{s_j ∈ adj(s_i)} [w(s_j, s_i) / Σ_k w(s_j, s_k)] · PR(s_j)
Supervised approaches train classifiers on sense-annotated corpora (primarily SemCor) using contextual features such as surrounding words, part-of-speech tags, and syntactic relations. IMS (It Makes Sense) and neural WSD systems using contextualized embeddings from BERT achieve state-of-the-art results. Knowledge-based approaches exploit information in lexical resources: the Lesk algorithm measures overlap between dictionary glosses and the target context, while graph-based methods like UKB run PageRank on the WordNet graph to find the most central senses for a given context.
Evaluation and Benchmarks
WSD is evaluated on the Senseval/SemEval shared task benchmarks, using WordNet senses as the inventory. The most-frequent-sense (MFS) baseline, which assigns each word its most common sense in SemCor, is remarkably strong, achieving around 65% F1 on all-words WSD. State-of-the-art neural systems using BERT-based representations achieve around 80% F1, a substantial improvement but still far from human performance (estimated at around 90%). The difficulty varies greatly by word: some words are effectively monosemous in practice, while others have many fine-grained senses that are difficult even for human annotators to distinguish.
The rise of contextualized word embeddings has transformed WSD. Models like BERT produce different representations for the same word in different contexts, implicitly performing a form of soft disambiguation. A simple nearest-neighbor WSD system that compares a token's BERT embedding to sense-annotated examples achieves strong results. This has led some researchers to argue that WSD as a separate task is being subsumed by general-purpose contextual language modeling, though explicit disambiguation remains important for applications requiring discrete sense labels.
Applications and Challenges
WSD improves downstream NLP tasks including machine translation (selecting the correct translation for an ambiguous source word), information retrieval (matching queries to documents with the right sense), and text simplification (choosing the appropriate synonym). In knowledge base population, WSD is essential for linking entity mentions to the correct entries in a knowledge base (entity linking can be viewed as a form of WSD over an entity inventory).
Ongoing challenges include the granularity problem (WordNet senses are too fine-grained for many applications), the knowledge acquisition bottleneck (supervised WSD requires expensive sense-annotated data), and multilingual WSD (sense inventories differ across languages). Word-in-context (WiC) tasks, which ask whether a word has the same sense in two different contexts without requiring a fixed inventory, offer a promising inventory-free alternative that aligns well with contextualized embedding methods.