Chart Parsing

Chart parsing is a general framework for parsing context-free grammars that stores intermediate results in a chart (also called a well-formed substring table) to avoid redundant computation. The chart records all constituents that have been discovered for each substring of the input, ensuring that no constituent is derived more than once. Both the CYK and Earley algorithms can be viewed as specific chart parsing strategies.

The Chart Data Structure

Chart Entries Passive edge: [A, i, j] — nonterminal A spans positions i to j
Active edge: [A → α • β, i, j] — partially recognized rule

Fundamental rule of chart parsing:
Active [A → α • B β, i, k] + Passive [B, k, j]
⇒ [A → α B • β, i, j]

The chart contains two types of entries: passive edges, which represent completed constituents (a nonterminal A spanning from position i to position j), and active edges, which represent partially recognized rules (a rule whose right-hand side has been partially matched). The fundamental rule of chart parsing combines an active edge expecting nonterminal B with a passive edge for B to produce a new (possibly still active) edge.

Parsing Strategies

Different chart parsing strategies differ in the order in which edges are added to the chart. Bottom-up strategies (like CYK) start from the words and build larger constituents. Top-down strategies start from the start symbol and predict what constituents are needed. Left-corner strategies combine both: they proceed bottom-up from the left corner of each rule but top-down for the rest. The agenda-based architecture maintains a priority queue of edges to process, enabling best-first search guided by probabilities or heuristics.

Packed Parse Forests

For ambiguous sentences, the chart implicitly represents an exponential number of parse trees in polynomial space. This packed representation, called a parse forest, can be unpacked on demand. Each cell can have multiple derivations, sharing common sub-derivations through the chart structure.

Efficiency and Applications

The key efficiency property of chart parsing is that each substring is analyzed at most once for each nonterminal, regardless of how many different parse trees use that constituent. This is critical for natural language, where ambiguity causes the number of parse trees to grow exponentially with sentence length. Chart parsing reduces this to polynomial-time recognition while still implicitly representing all parses. Modern statistical and neural parsers often use chart-based decoding to search over the space of possible trees efficiently.

Interactive Calculator

Enter CNF grammar rules (one per line, format A -> B C or A -> a) followed by a blank line and a sentence (words separated by spaces). The CYK algorithm builds a parse table and determines whether the sentence is in the language.

CNF Grammar + Sentence (blank line separates)

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

The Chart Data Structure

Parsing Strategies

Efficiency and Applications

Interactive Calculator

References

External Links

The Chart Data Structure

Parsing Strategies

Efficiency and Applications

Interactive Calculator

Related Topics

References

External Links