Michael Collins

Michael Collins

Michael Collins (b. 1967) is a computational linguist at Columbia University who made foundational contributions to statistical parsing, discriminative training methods for NLP, and structured prediction, developing some of the most accurate parsers of the early statistical NLP era.

Collins Parser: argmax_T Σ f(T) · w — linear model over parse features

Michael Collins is an Irish-American computer scientist at Columbia University whose work on statistical parsing, discriminative models, and structured prediction has been among the most influential in computational linguistics. His PhD thesis on statistical parsing introduced head-driven models that achieved state-of-the-art accuracy and established new standards for the field.

Early Life and Education

Born in Dublin, Ireland, in 1967, Collins studied computer science and mathematics at University College Dublin before earning his PhD from the University of Pennsylvania in 1999 under Mitchell Marcus. His dissertation on statistical parsing models was immediately recognised as a landmark contribution. He held positions at AT&T Labs and MIT before joining Columbia University.

1967

Born in Dublin, Ireland

1999

Completed PhD at the University of Pennsylvania

1999

Published influential head-driven statistical parsing models

2002

Developed the structured perceptron for NLP

2004

Introduced parameter estimation methods using large-margin training

2011

Received the ACL Fellowship

Key Contributions

Collins's head-driven statistical parsers (Models 1, 2, and 3) used lexicalised probabilistic context-free grammars where the probability of each rule depended on the head word of the phrase. By conditioning on head words, these models captured crucial selectional preferences and subcategorisation information that unlexicalised models missed, achieving dramatic improvements in parsing accuracy on the Penn Treebank.

He introduced the structured perceptron for NLP tasks, adapting the classical perceptron algorithm to work with structured outputs such as parse trees, tag sequences, and translation hypotheses. This provided a simple, effective alternative to maximum entropy and conditional random field models. His work on discriminative reranking — training a model to select the best parse from an n-best list produced by a generative parser — introduced powerful feature engineering techniques and demonstrated the value of discriminative training for structured prediction.

"The key to good parsing is capturing the right statistical dependencies — and head words are the most important single source of information for syntactic disambiguation." — Michael Collins

Legacy

Collins's parsing models set accuracy records that stood for years and influenced virtually all subsequent work on statistical parsing. The structured perceptron became a standard tool for NLP practitioners. His clear technical writing and tutorials on discriminative models trained a generation of NLP researchers. The methods he developed paved the way for modern neural approaches to structured prediction.

Interactive Calculator

Enter a CSV of publications: year,title,citations_count. The calculator computes total citations, h-index, peak year, and a per-decade breakdown of scholarly output.

Dataset (CSV)

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Early Life and Education

Key Contributions

Legacy

Interactive Calculator

References

External Links

Early Life and Education

Key Contributions

Legacy

Interactive Calculator

Related Topics

References

External Links