Computational Linguistics
About

Neural Constituency Parsing

Neural constituency parsers use deep neural networks to score or directly generate parse trees, surpassing traditional statistical parsers through learned representations and achieving over 95% F1 on the Penn Treebank.

score(T) = ∑_{(i,j,l) ∈ T} s_θ(i, j, l) where s_θ from neural span scorer

Neural constituency parsing applies deep learning architectures to the problem of predicting phrase-structure trees. These models replace the hand-crafted features and independence assumptions of statistical parsers with learned distributed representations that capture long-range contextual information. The two main paradigms are chart-based neural parsers, which score spans and use dynamic programming for decoding, and transition-based or sequence-to-sequence neural parsers, which generate trees incrementally.

Chart-Based Neural Parsing

Span-Based Parsing (Stern et al., 2017; Kitaev & Klein, 2018) For each span (i, j) and label l:
s(i, j, l) = MLP(hj − hi) · rl

hi = contextual representation from BiLSTM or Transformer
rl = learned label embedding

T* = argmaxT(i,j,l) ∈ T s(i, j, l)
Decoded via CYK in O(n³) or O(n²) with greedy top-down splitting

Chart-based neural parsers compute a score for each possible labeled span using neural representations, then find the highest-scoring tree using dynamic programming. Stern et al. (2017) used a BiLSTM encoder with span representations formed by subtracting endpoint vectors. Kitaev and Klein (2018) achieved a major breakthrough by using a self-attention Transformer encoder, reaching 95.1% F1 on the Penn Treebank. Their model uses a factored approach where span scores decompose into a sum over individual labeled spans, enabling efficient CYK-style decoding.

Sequence-to-Sequence Parsing

An alternative approach linearizes parse trees as sequences and uses sequence-to-sequence models to generate them. Vinyals et al. (2015) showed that an attention-based encoder-decoder model could produce reasonable parse trees when trained on linearized Penn Treebank trees. Later work by Choe and Charniak (2016) used neural language models over linearized trees for reranking. While these approaches are conceptually simple, chart-based methods generally achieve higher accuracy because they exploit the structural constraints of valid trees.

Pre-trained Language Models
The integration of pre-trained language models dramatically boosted constituency parsing accuracy. Kitaev et al. (2019) combined their chart parser with BERT representations, achieving 95.7% F1. Using XLNet pushed this to 96.1%, and subsequent work with larger pre-trained models has further improved results, approaching the estimated ceiling of human agreement (~97%).

Current State of the Art

Modern neural constituency parsers achieve remarkable accuracy: over 96% F1 on the Penn Treebank WSJ test set, compared to ~91% for the best pre-neural statistical parsers. Key factors driving this improvement include contextual word representations from Transformers, self-attention mechanisms that capture long-range dependencies, and pre-trained language models that provide rich linguistic knowledge. These parsers also generalize better to out-of-domain text and to other languages, especially when combined with multilingual pre-trained models.

Related Topics

References

  1. Kitaev, N., & Klein, D. (2018). Constituency parsing with a self-attentive encoder. Proceedings of ACL 2018, 2676–2686. https://doi.org/10.18653/v1/P18-1249
  2. Kitaev, N., Cao, S., & Klein, D. (2019). Multilingual constituency parsing with self-attention and pre-training. Proceedings of ACL 2019, 3499–3505. https://doi.org/10.18653/v1/P19-1340
  3. Stern, M., Andreas, J., & Klein, D. (2017). A minimal span-based neural constituency parser. Proceedings of ACL 2017, 818–827. https://doi.org/10.18653/v1/P17-1076
  4. Vinyals, O., Kaiser, Ł., Koo, T., Petrov, S., Sutskever, I., & Hinton, G. (2015). Grammar as a foreign language. Advances in Neural Information Processing Systems 28, 2773–2781.

External Links