Chapter 11.02: How to reduce memory and compute?

Here you will learn about ways to reduce the memory and compute requirements for big models. We introduce distributed training, where you make use of data- and tensor parallellism, and FlashAttention, a method to perform attention more efficiently.

Lecture Slides

« Chapter 11.01: Memory and compute requirements
Chapter 11.03: Scaling Laws and Chinchilla »