Computational Linguistics
About

Transition-Based Parsing

Transition-based dependency parsing builds a parse tree incrementally through a sequence of shift and reduce actions applied to a stack-buffer configuration, achieving linear-time parsing.

c = (σ, β, A): SHIFT | LEFT-ARC_l | RIGHT-ARC_l

Transition-based dependency parsing constructs a dependency tree by applying a sequence of actions (transitions) to a configuration consisting of a stack, an input buffer, and a set of arcs built so far. At each step, a classifier selects the next action based on features of the current configuration. Because each action takes constant time and the number of actions is linear in the sentence length, transition-based parsers run in O(n) time, making them among the fastest parsing algorithms.

Arc-Standard System

Arc-Standard Transitions Configuration: c = (σ, β, A)
σ = stack, β = buffer, A = set of dependency arcs

SHIFT: (σ, wi|β, A) ⇒ (σ|wi, β, A)
LEFT-ARCl: (σ|wi|wj, β, A) ⇒ (σ|wj, β, A ∪ {(wj, l, wi)})
RIGHT-ARCl: (σ|wi|wj, β, A) ⇒ (σ|wi, β, A ∪ {(wi, l, wj)})

Initial: ([root], w1...wn, ∅)
Terminal: ([root], ∅, A)

The arc-standard system, introduced by Nivre (2004), uses three transitions. SHIFT moves the next word from the buffer onto the stack. LEFT-ARC creates a dependency from the top of the stack to the second element and removes the second element. RIGHT-ARC creates a dependency from the second element to the top and removes the top. The arc-eager variant allows arcs to be created earlier, as soon as a head-dependent pair is identified, which can improve accuracy for certain constructions.

Training and Classification

The transition classifier is trained on oracle transition sequences derived from gold-standard dependency trees. Classical approaches used SVMs or perceptrons over hand-crafted features (word forms, POS tags, and already-built arcs on the stack). Chen and Manning (2014) introduced neural transition-based parsing, using a feed-forward network with dense embeddings, which achieved comparable accuracy with much simpler feature engineering. Later work by Kiperwasser and Goldberg (2016) and Dozat and Manning (2017) used BiLSTM representations for further gains.

Error Propagation
A key weakness of greedy transition-based parsing is error propagation: an incorrect early decision cannot be undone and may cause subsequent errors. Beam search (maintaining the top-k configurations) and global training objectives (structured perceptron, global normalization) mitigate this problem but increase time complexity beyond linear.

Transition-based parsers dominate practical applications where speed is critical, such as real-time text processing pipelines. The spaCy NLP library, for example, uses a transition-based parser as its default. Despite their greedy nature, modern neural transition-based parsers with beam search or dynamic oracles achieve accuracy comparable to graph-based alternatives, while maintaining substantially faster parsing speed.

Related Topics

References

  1. Nivre, J. (2004). Incrementality in deterministic dependency parsing. Proceedings of the ACL Workshop on Incremental Parsing, 50–57. https://doi.org/10.3115/1613148.1613156
  2. Chen, D., & Manning, C. D. (2014). A fast and accurate dependency parser using neural networks. Proceedings of EMNLP 2014, 740–750. https://doi.org/10.3115/v1/D14-1082
  3. Nivre, J. (2008). Algorithms for deterministic incremental dependency parsing. Computational Linguistics, 34(4), 513–553. https://doi.org/10.1162/coli.07-056-R1-07-027
  4. Goldberg, Y., & Nivre, J. (2013). Training deterministic parsers with non-deterministic oracles. Transactions of the ACL, 1, 403–414. https://doi.org/10.1162/tacl_a_00237

External Links