Deep Learning for Natural Language Processing (DL4NLP) | Chapter 11: Training Large Language Models

In this chapter we cover multiple concepts that deal with training LLMs. You will learn about Transformer computation and scaling laws. In the second chapter we discuss how we can optimize LLM performance.

Chapter 11.01: Memory and compute requirements
Large language models (LLMs) require significant compute and memory resources due to their vast number of parameters and complex architectures. In this chapter you will learn about different contributions to compute requirements and how model size components influence memory requirements.
Chapter 11.02: How to reduce memory and compute?
Here you will learn about ways to reduce the memory and compute requirements for big models. We introduce distributed training, where you make use of data- and tensor parallellism, and FlashAttention, a method to perform attention more efficiently.
Chapter 11.03: Scaling Laws and Chinchilla
In this chapter we introduce various scaling laws and chinchilla.
Chapter 11.04: LLM Optimization
In this Chapter we discuss ways to optimize the performance of Large Language Models (LLMs) with methods such as Prompt engineering or methods beyond that.