Computational Linguistics
About

Jeffrey Pennington

Jeffrey Pennington is an American computer scientist who co-developed GloVe (Global Vectors for Word Representation), a word embedding method that combines the advantages of global matrix factorisation and local context window methods.

GloVe: wᵢᵀw̃ⱼ + bᵢ + b̃ⱼ = log(Xᵢⱼ)

Jeffrey Pennington is a researcher who, while at Stanford University, co-developed the GloVe word embedding model with Richard Socher and Christopher Manning. GloVe provided a mathematically principled alternative to Word2Vec by directly factorising the word co-occurrence matrix, producing word vectors that captured both semantic and syntactic regularities and became one of the two most widely used pre-trained word embedding models.

Early Life and Education

Pennington studied physics and applied mathematics, bringing a strong background in mathematical modelling and optimisation to the problem of word representation learning. His work at Stanford combined insights from count-based distributional semantics with neural embedding methods.

2014

Published "GloVe: Global Vectors for Word Representation" at EMNLP

2014

Released pre-trained GloVe vectors trained on Common Crawl and Wikipedia

2014

GloVe achieved state-of-the-art results on word analogy and NER tasks

Key Contributions

GloVe (Global Vectors) learns word embeddings by factorising the logarithm of the word-word co-occurrence matrix. The key insight is that the ratio of co-occurrence probabilities between two words with various probe words encodes meaning: if P(ice|solid)/P(ice|gas) is large and P(steam|solid)/P(steam|gas) is small, this ratio distinguishes solid-related from gas-related concepts. GloVe optimises the objective: J = Sigma f(X_ij)(w_i^T w_j + b_i + b_j - log X_ij)^2, where X_ij is the co-occurrence count and f is a weighting function that caps the influence of very frequent co-occurrences.

Unlike Word2Vec, which operates on local context windows, GloVe explicitly leverages global co-occurrence statistics, combining the advantages of count-based methods (which use global statistics efficiently) with those of prediction-based methods (which produce dense, low-dimensional vectors). The resulting vectors perform competitively with or better than Word2Vec on word analogy, similarity, and named entity recognition tasks.

"GloVe is designed to capture both the global statistical information and the fine-grained local patterns that make word vectors useful for downstream tasks." — Pennington et al., "GloVe: Global Vectors for Word Representation" (2014)

Legacy

Pre-trained GloVe vectors (trained on 6 billion tokens of Wikipedia and Gigaword, or 840 billion tokens of Common Crawl) became standard resources used by thousands of NLP researchers and practitioners. GloVe's mathematical analysis of the relationship between co-occurrence statistics and word vector properties deepened understanding of why word embeddings work and influenced subsequent work on embedding methods. Together with Word2Vec, GloVe defined the pre-trained embedding era that preceded contextualised models like BERT.

Interactive Calculator

Enter a CSV of publications: year,title,citations_count. The calculator computes total citations, h-index, peak year, and a per-decade breakdown of scholarly output.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

References

  1. Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. Proceedings of EMNLP, 1532–1543.
  2. Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. Advances in Neural Information Processing Systems, 27, 2177–2185.
  3. Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the ACL, 238–247.
  4. Manning, C. D., & Schuetze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.

External Links