Computational Linguistics
About

Ashish Vaswani

Ashish Vaswani is a computer scientist who was the lead author of the 2017 paper 'Attention Is All You Need,' which introduced the Transformer architecture and fundamentally reshaped natural language processing and artificial intelligence.

Attention(Q,K,V) = softmax(QKᵀ / √dₖ)V

Ashish Vaswani is a machine learning researcher who, while at Google Brain, led the development of the Transformer architecture. The 2017 paper "Attention Is All You Need," of which Vaswani was the first author, introduced a purely attention-based sequence transduction model that replaced recurrent and convolutional layers entirely. This architecture became the foundation for virtually all subsequent large language models, including BERT, GPT, T5, and their descendants.

Early Life and Education

Vaswani studied at the Indian Institute of Technology and later earned advanced degrees in computer science in the United States. He joined Google Brain, where he worked on sequence modelling and neural machine translation, collaborating with Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, and Illia Polosukhin on the Transformer paper.

2017

Published "Attention Is All You Need" at NeurIPS, introducing the Transformer

2017

Transformer achieved state-of-the-art results on English-German and English-French translation

2021

Co-founded Adept AI

2024

Co-founded Essential AI

Key Contributions

The Transformer architecture replaces recurrence with multi-head self-attention, allowing each position in a sequence to attend to all other positions in parallel. The scaled dot-product attention mechanism computes Attention(Q,K,V) = softmax(QK^T / sqrt(d_k))V, where Q (queries), K (keys), and V (values) are linear projections of the input. Multi-head attention runs this mechanism multiple times in parallel with different learned projections, capturing different types of relationships.

The Transformer introduced positional encodings to inject sequence order information without recurrence, and its encoder-decoder structure with layer normalisation, residual connections, and feed-forward sub-layers established the architectural template used by all subsequent large language models. The ability to process all positions in parallel made Transformers dramatically more efficient to train than RNNs, enabling the scaling of models to billions of parameters.

"We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely." — Vaswani et al., "Attention Is All You Need" (2017)

Legacy

The Transformer is arguably the most consequential single architecture in the history of deep learning for NLP. It enabled BERT, GPT, T5, and all subsequent foundation models. The paper has been cited over 100,000 times and the architecture has been adopted not only in NLP but in computer vision, speech processing, protein folding, and virtually every area of machine learning. Vaswani's subsequent work has focused on building AI companies that leverage Transformer-based models for practical applications.

Interactive Calculator

Enter a CSV of publications: year,title,citations_count. The calculator computes total citations, h-index, peak year, and a per-decade breakdown of scholarly output.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

References

  1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008.
  2. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT, 4171–4186.
  3. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Technical Report.
  4. Lin, T., Wang, Y., Liu, X., & Qiu, X. (2022). A survey of transformers. AI Open, 3, 111–132. doi:10.1016/j.aiopen.2022.10.001

External Links