Computational Linguistics
About

Phrase Tables

Phrase tables are data structures that store bilingual phrase pairs together with associated translation probabilities, forming the core knowledge source in phrase-based statistical machine translation systems.

P(ē|f̄) = count(f̄, ē) / Σ_{ē'} count(f̄, ē')

A phrase table is a lookup table that maps source-language phrases to target-language phrases along with a set of feature scores, most notably the phrase translation probabilities in both directions. In phrase-based SMT, the phrase table is the primary repository of bilingual knowledge, extracted automatically from word-aligned parallel corpora. The quality of the phrase table directly determines the quality of translation, and considerable research has been devoted to improving phrase extraction, scoring, and filtering methods.

Phrase Extraction

Phrase Extraction Criterion A phrase pair (f̄, ē) is consistent with alignment A if:
∀ f_i ∈ f̄: (f_i, e_j) ∈ A → e_j ∈ ē
∀ e_j ∈ ē: (f_i, e_j) ∈ A → f_i ∈ f̄
∃ f_i ∈ f̄, e_j ∈ ē: (f_i, e_j) ∈ A

Extract all phrase pairs consistent with the word alignment up to a maximum phrase length

Phrase extraction follows the consistency criterion introduced by Och et al. (1999): a phrase pair is extracted if and only if no word inside the source phrase is aligned to a word outside the target phrase, and vice versa, and at least one alignment point exists within the phrase pair. Given a word-aligned sentence pair, all consistent phrase pairs up to a maximum length (typically 7 words) are extracted. This heuristic approach, while not derived from a rigorous probabilistic model, proved remarkably effective in practice.

Feature Scores

Each phrase table entry typically includes four probability scores: the forward phrase translation probability P(ē|f̄), the inverse phrase translation probability P(f̄|ē), the forward lexical weighting score, and the inverse lexical weighting score. Lexical weighting provides a smoothed estimate based on word-level translation probabilities, helping to counteract the sparsity of phrase-level counts. These features, along with a phrase penalty, are combined in the log-linear model framework of the decoder.

Phrase Table Filtering

Unfiltered phrase tables can be enormous — billions of entries for large parallel corpora — posing storage and speed challenges. Various filtering strategies have been proposed: count-based pruning removes entries below a frequency threshold; significance testing (Johnson et al., 2007) retains only phrase pairs whose co-occurrence is statistically significant; and entropy-based filtering removes entries with high translation ambiguity. These methods can reduce phrase table size by an order of magnitude with minimal impact on translation quality.

Limitations and Legacy

Phrase tables have several inherent limitations. They cannot generalize beyond observed phrases: an unseen word form or a novel combination will have no entry. They treat each phrase pair independently, ignoring broader sentence context. Long-range reordering across phrase boundaries requires separate distortion models. These limitations motivated hierarchical phrase-based models and ultimately the shift to neural machine translation, where translation knowledge is encoded in continuous model parameters rather than discrete tables.

Despite the dominance of neural MT, phrase tables retain relevance in specific scenarios. In domain adaptation, phrase tables extracted from in-domain data can be used to constrain neural outputs. In low-resource settings, phrase-level translation memories can supplement limited training data. The conceptual framework of phrase-based translation — decomposing translation into local phrase substitutions with reordering — also informs the analysis and interpretation of neural translation models.

Related Topics

References

  1. Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. Proceedings of NAACL-HLT 2003, 48–54. doi:10.3115/1073445.1073462
  2. Och, F. J., Tillmann, C., & Ney, H. (1999). Improved alignment models for statistical machine translation. Proceedings of EMNLP/VLC 1999, 20–28. aclanthology.org/W99-0604
  3. Johnson, J. H., Martin, J., Foster, G., & Kuhn, R. (2007). Improving translation quality by discarding most of the phrasetable. Proceedings of EMNLP-CoNLL 2007, 967–975. aclanthology.org/D07-1103

External Links