Computational Linguistics
About

Inflection

Inflection modifies words to express grammatical categories such as tense, number, case, and gender without changing their core meaning or part of speech, and computational models of inflection are central to morphological generation and analysis.

inflect(lemma, features) → surface form

Inflection is the morphological process by which a word is modified to encode grammatical information — tense, aspect, mood, person, number, gender, case, definiteness, and other categories — while preserving the word's fundamental meaning and part of speech. Unlike derivation, which creates new lexemes (e.g., "happy" to "happiness"), inflection creates different forms of the same lexeme (e.g., "run," "runs," "ran," "running"). Computational modeling of inflection is essential for morphological generation, machine translation, grammatical error correction, and any system that must produce correctly inflected text.

Inflectional Paradigms

Inflectional Paradigm For a verbal lemma L with features F = {tense, person, number, mood}:
paradigm(L) = { inflect(L, f) | f ∈ F }

Example (Spanish "hablar"):
inflect(hablar, {Pres, 1sg, Ind}) → hablo
inflect(hablar, {Pres, 2sg, Ind}) → hablas
inflect(hablar, {Pret, 3sg, Ind}) → habló
inflect(hablar, {Subj, 1pl, Pres}) → hablemos

An inflectional paradigm is the complete set of forms a lexeme can take. The size of paradigms varies enormously across languages: English verbs have at most five forms (walk, walks, walked, walking, walked), while a Finnish noun has over 2,000 forms when all case, number, and possessive combinations are considered. Paradigm structure exhibits regularities that computational models exploit: most forms follow predictable patterns, with irregularity concentrated in high-frequency items.

Neural Morphological Inflection

The SIGMORPHON shared tasks have established morphological inflection — generating the correct surface form given a lemma and a set of morphological features — as a benchmark task for neural sequence modeling. Encoder-decoder architectures with attention, operating at the character level, achieve high accuracy across typologically diverse languages. These models take as input the characters of the lemma concatenated with feature tags and produce the characters of the inflected form. Hard attention mechanisms and copy mechanisms improve performance by allowing the model to preserve stem characters while modifying affixes.

The Paradigm Cell Filling Problem

A fundamental question in inflectional morphology is how speakers (and models) generalize from observed forms to fill unobserved cells in a paradigm. Ackerman, Blevins, and Malouf (2009) formalized this as the "paradigm cell filling problem": given some forms of a lexeme, predict the rest. Information-theoretic analyses show that paradigms in natural languages tend to have low conditional entropy — knowing one form strongly constrains others — suggesting that languages are structured to facilitate this generalization. Neural models implicitly learn these inter-form dependencies during training.

Syncretism and Irregularity

Inflectional systems exhibit syncretism (identical forms for different feature combinations) and irregularity (forms that deviate from regular patterns). Syncretism is linguistically systematic — for example, German adjective endings systematically collapse distinctions in certain environments — and can be modeled by mapping multiple feature bundles to the same exponent. Irregular inflection, as in English strong verbs (sing/sang/sung), requires memorization of specific forms. Neural inflection models handle irregularity by memorizing patterns from training data, but struggle with rare irregulars not seen during training.

Cross-linguistic variation in inflection is enormous. Isolating languages like Mandarin Chinese have virtually no inflection, while polysynthetic languages like Mohawk encode in a single verb form what English requires an entire sentence to express. Computational models of inflection must be flexible enough to handle this typological diversity, and recent multilingual approaches use language embeddings and shared character representations to transfer inflectional knowledge across related languages.

Interactive Calculator

Enter words (one per line). The calculator applies simplified Porter-like suffix-stripping rules to identify likely suffixes, extract stems, and estimate morpheme counts.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

References

  1. Cotterell, R., Kirov, C., Sylak-Glassman, J., Walther, G., Vylomova, E., Xia, P., ... & Hulden, M. (2018). The CoNLL-SIGMORPHON 2018 shared task: Universal morphological reinflection. Proceedings of CoNLL-SIGMORPHON, 1–27. doi:10.18653/v1/K18-3001
  2. Ackerman, F., Blevins, J. P., & Malouf, R. (2009). Parts and wholes: Implicative patterns in inflectional paradigms. In J. P. Blevins & J. Blevins (Eds.), Analogy in Grammar (pp. 54–82). Oxford University Press.
  3. Wu, S., Cotterell, R., & Hulden, M. (2021). Applying the transformer to character-level transduction. Proceedings of the 16th Conference of the European Chapter of the ACL, 1901–1907. doi:10.18653/v1/2021.eacl-main.163

External Links