Tomas Mikolov is a Czech computer scientist whose work on word embeddings and recurrent neural network language models transformed how NLP systems represent and reason about word meaning. His Word2Vec models, developed at Google in 2013, demonstrated that simple neural networks trained on large text corpora produce word vectors with remarkable semantic properties, launching the modern era of distributed word representations.
Early Life and Education
Born in the Czech Republic, Mikolov studied at Brno University of Technology, where he developed recurrent neural network language models during his doctoral research. He then worked at Google Brain and later at Facebook AI Research (FAIR), making major contributions to word representation learning at both institutions.
Developed recurrent neural network language models at Brno University of Technology
Joined Google Brain
Published "Efficient Estimation of Word Representations in Vector Space" (Word2Vec)
Published "Distributed Representations of Words and Phrases and their Compositionality"
Joined Facebook AI Research
Co-developed FastText at Facebook
Key Contributions
Word2Vec introduced two efficient architectures for learning word embeddings: the Continuous Bag-of-Words (CBOW) model, which predicts a target word from its context, and the Skip-gram model, which predicts context words from a target word. Both use a shallow neural network trained on large corpora. The resulting word vectors exhibit striking algebraic properties: vec("king") - vec("man") + vec("woman") approximately equals vec("queen"), demonstrating that the vector space captures semantic relationships.
Mikolov introduced training innovations including negative sampling and subsampling of frequent words that made Word2Vec practical to train on billions of words. At Facebook, he co-developed FastText, which extended Word2Vec by representing words as bags of character n-grams, enabling better handling of morphologically rich languages and out-of-vocabulary words. His earlier work on RNNLM (recurrent neural network language models) also demonstrated that RNNs could outperform n-gram models.
"The word vectors capture many linguistic regularities: the vector for 'king' minus 'man' plus 'woman' gives a vector closest to 'queen'." — Tomas Mikolov et al., "Efficient Estimation of Word Representations in Vector Space" (2013)
Legacy
Word2Vec was one of the most influential papers of the 2010s, cited over 40,000 times. It made word embeddings accessible to every NLP practitioner and spawned an entire research area on distributed word representations. Pre-trained Word2Vec and FastText vectors remain widely used. The success of word embeddings paved the way for contextualised embeddings (ELMo, BERT) and demonstrated that simple prediction tasks on large corpora could yield rich linguistic knowledge.