Chapter 07.02: GPT-2 (2019)

GPT-2 [1] builds upon its predecessor with a larger model size, more training data, and improved architecture. Like GPT-1, GPT-2 utilizes a generative transformer architecture but features a significantly increased number of parameters, leading to enhanced performance in language understanding and generation tasks. Additionally, GPT-2 introduces a scaled-up version of the training data and fine-tuning techniques to further refine its language capabilities.

Lecture Slides

References

[1] Radford et al., 2018

« Chapter 07.01: GPT-1 (2018)
Chapter 07.03: GPT-3 (2020) & X-shot learning »