Computational Linguistics
About

Probabilistic CFG

A probabilistic context-free grammar (PCFG) augments each production rule with a probability, enabling disambiguation by selecting the parse tree with the highest likelihood under the model.

P(tree) = ∏_{(A→α) ∈ tree} P(A→α | A)

A probabilistic context-free grammar (PCFG) extends a standard CFG by assigning a conditional probability to each production rule. The probability of a parse tree is the product of the probabilities of all rules used in the derivation. For an ambiguous sentence, the PCFG selects the parse tree with the highest probability, providing a principled approach to disambiguation. PCFGs also define a probability distribution over strings, making them generative models of language.

Formal Definition

PCFG G = (N, T, R, S, q)
q(A → α) = P(α | A) for each rule A → α ∈ R
Constraint: ∑α q(A → α) = 1 for each A ∈ N

P(tree t) = ∏r ∈ t q(r)
P(sentence w) = ∑t: yield(t)=w P(t)

The probability of a sentence is the sum of the probabilities of all its parse trees. The most likely parse for a sentence w is argmaxt P(t | w) = argmaxt P(t), since all parses generate the same w. The Viterbi parse can be found efficiently using the probabilistic CYK algorithm, and the inside-outside algorithm computes the total string probability and marginal probabilities of constituents.

Parameter Estimation

PCFG rule probabilities are typically estimated from a treebank using maximum likelihood estimation: q(A → α) = count(A → α) / count(A). When treebank data is unavailable, the inside-outside algorithm (a special case of EM) can estimate parameters from unannotated text. However, vanilla PCFGs have significant limitations: they assume context-free independence between rules, which means they cannot capture lexical preferences, structural dependencies, or the tendency of certain constructions to co-occur.

Limitations of Vanilla PCFGs
PCFGs assign the same probability to a rule regardless of its context in the tree. This means they cannot model lexical head preferences (e.g., that "put" requires a PP complement) or structural biases (e.g., right-branching preference in English). Lexicalized PCFGs and parent-annotated PCFGs address these limitations.

Extensions

Major extensions include lexicalized PCFGs (Collins, 1997; Charniak, 1997) that condition rule probabilities on head words, parent-annotated PCFGs (Johnson, 1998) that refine nonterminals with parent information, and latent-variable PCFGs (Matsuzaki et al., 2005; Petrov et al., 2006) that automatically learn fine-grained nonterminal subcategories. These extensions dramatically improve parsing accuracy, with the Berkeley Parser achieving over 90% F1 using split-merge latent-variable training. PCFGs remain foundational in computational linguistics even as neural approaches have become dominant.

Interactive Calculator

Enter CNF grammar rules (one per line, format A -> B C or A -> a) followed by a blank line and a sentence (words separated by spaces). The CYK algorithm builds a parse table and determines whether the sentence is in the language.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

References

  1. Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), 598–603.
  2. Collins, M. (2003). Head-driven statistical models for natural language parsing. Computational Linguistics, 29(4), 589–637. https://doi.org/10.1162/089120103322753356
  3. Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. Proceedings of COLING-ACL 2006, 433–440. https://doi.org/10.3115/1220175.1220230
  4. Johnson, M. (1998). PCFG models of linguistic tree representations. Computational Linguistics, 24(4), 613–632. https://doi.org/10.5555/972764.972768

External Links