Neural Constituency Parsing

Neural constituency parsing applies deep learning architectures to the problem of predicting phrase-structure trees. These models replace the hand-crafted features and independence assumptions of statistical parsers with learned distributed representations that capture long-range contextual information. The two main paradigms are chart-based neural parsers, which score spans and use dynamic programming for decoding, and transition-based or sequence-to-sequence neural parsers, which generate trees incrementally.

Chart-Based Neural Parsing

Span-Based Parsing (Stern et al., 2017; Kitaev & Klein, 2018) For each span (i, j) and label l:
s(i, j, l) = MLP(h_j − h_i) · r_l

h_i = contextual representation from BiLSTM or Transformer
r_l = learned label embedding

T* = argmax_T ∑_{(i,j,l) ∈ T} s(i, j, l)
Decoded via CYK in O(n³) or O(n²) with greedy top-down splitting

Chart-based neural parsers compute a score for each possible labeled span using neural representations, then find the highest-scoring tree using dynamic programming. Stern et al. (2017) used a BiLSTM encoder with span representations formed by subtracting endpoint vectors. Kitaev and Klein (2018) achieved a major breakthrough by using a self-attention Transformer encoder, reaching 95.1% F1 on the Penn Treebank. Their model uses a factored approach where span scores decompose into a sum over individual labeled spans, enabling efficient CYK-style decoding.

Sequence-to-Sequence Parsing

An alternative approach linearizes parse trees as sequences and uses sequence-to-sequence models to generate them. Vinyals et al. (2015) showed that an attention-based encoder-decoder model could produce reasonable parse trees when trained on linearized Penn Treebank trees. Later work by Choe and Charniak (2016) used neural language models over linearized trees for reranking. While these approaches are conceptually simple, chart-based methods generally achieve higher accuracy because they exploit the structural constraints of valid trees.

Pre-trained Language Models

The integration of pre-trained language models dramatically boosted constituency parsing accuracy. Kitaev et al. (2019) combined their chart parser with BERT representations, achieving 95.7% F1. Using XLNet pushed this to 96.1%, and subsequent work with larger pre-trained models has further improved results, approaching the estimated ceiling of human agreement (~97%).

Current State of the Art

Modern neural constituency parsers achieve remarkable accuracy: over 96% F1 on the Penn Treebank WSJ test set, compared to ~91% for the best pre-neural statistical parsers. Key factors driving this improvement include contextual word representations from Transformers, self-attention mechanisms that capture long-range dependencies, and pre-trained language models that provide rich linguistic knowledge. These parsers also generalize better to out-of-domain text and to other languages, especially when combined with multilingual pre-trained models.

Chart-Based Neural Parsing

Sequence-to-Sequence Parsing

Current State of the Art

References

External Links

Chart-Based Neural Parsing

Sequence-to-Sequence Parsing

Current State of the Art

Related Topics

References

External Links