Chapter 11.02: How to reduce memory and compute?
Here you will learn about ways to reduce the memory and compute requirements for big models. We introduce distributed training, where you make use of data- and tensor parallellism, and FlashAttention, a method to perform attention more efficiently.