Computational Linguistics
About

Chart Parsing

Chart parsing is a family of dynamic programming parsing strategies that use a shared data structure (the chart) to avoid redundant recomputation of subtrees, enabling efficient parsing of ambiguous grammars.

chart[i][j] = {A | A ⇒* w_i ... w_j}

Chart parsing is a general framework for parsing context-free grammars that stores intermediate results in a chart (also called a well-formed substring table) to avoid redundant computation. The chart records all constituents that have been discovered for each substring of the input, ensuring that no constituent is derived more than once. Both the CYK and Earley algorithms can be viewed as specific chart parsing strategies.

The Chart Data Structure

Chart Entries Passive edge: [A, i, j] — nonterminal A spans positions i to j
Active edge: [A → α • β, i, j] — partially recognized rule

Fundamental rule of chart parsing:
Active [A → α • B β, i, k] + Passive [B, k, j]
⇒ [A → α B • β, i, j]

The chart contains two types of entries: passive edges, which represent completed constituents (a nonterminal A spanning from position i to position j), and active edges, which represent partially recognized rules (a rule whose right-hand side has been partially matched). The fundamental rule of chart parsing combines an active edge expecting nonterminal B with a passive edge for B to produce a new (possibly still active) edge.

Parsing Strategies

Different chart parsing strategies differ in the order in which edges are added to the chart. Bottom-up strategies (like CYK) start from the words and build larger constituents. Top-down strategies start from the start symbol and predict what constituents are needed. Left-corner strategies combine both: they proceed bottom-up from the left corner of each rule but top-down for the rest. The agenda-based architecture maintains a priority queue of edges to process, enabling best-first search guided by probabilities or heuristics.

Packed Parse Forests
For ambiguous sentences, the chart implicitly represents an exponential number of parse trees in polynomial space. This packed representation, called a parse forest, can be unpacked on demand. Each cell can have multiple derivations, sharing common sub-derivations through the chart structure.

Efficiency and Applications

The key efficiency property of chart parsing is that each substring is analyzed at most once for each nonterminal, regardless of how many different parse trees use that constituent. This is critical for natural language, where ambiguity causes the number of parse trees to grow exponentially with sentence length. Chart parsing reduces this to polynomial-time recognition while still implicitly representing all parses. Modern statistical and neural parsers often use chart-based decoding to search over the space of possible trees efficiently.

Interactive Calculator

Enter CNF grammar rules (one per line, format A -> B C or A -> a) followed by a blank line and a sentence (words separated by spaces). The CYK algorithm builds a parse table and determines whether the sentence is in the language.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

References

  1. Kay, M. (1986). Algorithm schemata and data structures in syntactic processing. In B. J. Grosz, K. Sparck Jones, & B. L. Webber (Eds.), Readings in natural language processing (pp. 35–70). Morgan Kaufmann.
  2. Thompson, H. (1983). Chart parsing and rule schemata in phrase structure grammar. Proceedings of the 21st Annual Meeting of the ACL, 167–172. https://doi.org/10.3115/981311.981340
  3. Jurafsky, D., & Martin, J. H. (2024). Speech and language processing (3rd ed. draft). Pearson. https://web.stanford.edu/~jurafsky/slp3/
  4. Stolcke, A. (1995). An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics, 21(2), 165–201. https://doi.org/10.5555/211190.211197

External Links