Chapter 07.02: GPT-2 (2019)

GPT-2 [1] builds upon its predecessor with a larger model size, more training data, and improved architecture. Like GPT-1, GPT-2 utilizes a generative transformer architecture but features a significantly increased number of parameters, leading to enhanced performance in language understanding and generation tasks. Additionally, GPT-2 introduces a scaled-up version of the training data and fine-tuning techniques to further refine its language capabilities.

Lecture Slides

References