Computational Linguistics
About

Hierarchical Phrase-Based SMT

Hierarchical phrase-based SMT extends flat phrase-based models with synchronous context-free grammar rules that contain nonterminal symbols, enabling the capture of recursive and long-distance reordering patterns without requiring explicit syntactic annotation.

X → ⟨γ, α, ~⟩ where γ ∈ (Σ_f ∪ {X})*, α ∈ (Σ_e ∪ {X})*

Hierarchical phrase-based SMT (Chiang, 2005, 2007) addresses the fundamental limitation of flat phrase-based models: their inability to capture nested and long-distance reordering patterns. By extending phrase pairs to include nonterminal symbols, hierarchical models can represent recursive translation patterns. For instance, a rule X → ⟨X₁ de X₂, X₂ of X₁⟩ captures the systematic reordering of possessive constructions between Chinese and English. These rules are learned automatically from parallel text without syntactic annotation, using only the word alignments.

Synchronous Context-Free Grammar

Hierarchical Translation Rules X → ⟨γ, α, ~⟩

γ = source side (terminals and nonterminals)
α = target side (terminals and nonterminals)
~ = one-to-one correspondence between nonterminals

Examples:
X → ⟨X₁ de X₂, X₂ of X₁⟩ (reordering)
X → ⟨acheter X₁, buy X₁⟩ (monotone)
X → ⟨maison, house⟩ (lexical phrase pair)

Hierarchical rules are extracted from word-aligned parallel text by identifying phrase pairs (as in standard phrase-based SMT) and then generalizing them by replacing sub-phrase pairs with nonterminal symbols. The resulting synchronous context-free grammar (SCFG) typically uses a single nonterminal category X plus a sentence-level start symbol S. This minimal syntactic structure is sufficient to capture a wide range of reordering phenomena while keeping the grammar tractable for decoding. Rules are scored with the same features as phrase pairs — forward and inverse translation probabilities, lexical weights — plus additional features for rule type and arity.

CYK Decoding

Decoding in hierarchical SMT uses a bottom-up CYK-style chart parsing algorithm on the source sentence. Each cell in the chart stores the best translations for a source span, computed by combining translations of sub-spans according to the grammar rules. Language model integration requires maintaining target-side boundary words in each chart cell, and cube pruning (Chiang, 2007) is used to efficiently explore the combinatorial space of rule applications and language model contexts. The time complexity is O(n³) in the source sentence length, compared to O(n²) for phrase-based decoding with the same distortion limit.

Syntax-Augmented Models

While Chiang's original model uses unlabeled nonterminals, subsequent work incorporated syntactic labels from parse trees. Syntax-augmented models (Zollmann and Venugopal, 2006) label nonterminals with syntactic categories from target-side parse trees, allowing the grammar to distinguish between NP, VP, and other constituents. String-to-tree, tree-to-string, and tree-to-tree models apply syntactic constraints on one or both sides of the translation. These syntactically informed models achieved further improvements, particularly for language pairs with significant structural divergences.

Impact and Limitations

Hierarchical phrase-based models achieved consistent improvements over flat phrase-based systems, particularly for language pairs with long-distance reordering such as Chinese-English and Arabic-English. The model's ability to represent nested reordering within a single formal framework was theoretically elegant and practically effective. Hierarchical SMT also influenced the development of syntax-based neural MT models and tree-based decoding strategies.

However, hierarchical models have higher computational costs than flat phrase-based systems and larger grammar sizes. The single-nonterminal grammar, while simpler, may over-generate by permitting reorderings that no linguistically motivated grammar would produce. The tension between the expressiveness of the grammar and the tractability of decoding remains a central concern, and various pruning and filtering strategies are needed to make hierarchical systems practical at scale.

Related Topics

References

  1. Chiang, D. (2005). A hierarchical phrase-based model for statistical machine translation. Proceedings of ACL 2005, 263–270. doi:10.3115/1219840.1219873
  2. Chiang, D. (2007). Hierarchical phrase-based translation. Computational Linguistics, 33(2), 201–228. doi:10.1162/coli.2007.33.2.201
  3. Zollmann, A., & Venugopal, A. (2006). Syntax augmented machine translation via chart parsing. Proceedings of the NAACL 2006 Workshop on SMT, 138–141. aclanthology.org/W06-3119

External Links