Chapters

Chapter 0: Machine Learning Basics
This chapter introduces the basic concepts of Machine Learning. We therefore rely the excellent material from the I2ML Course which already comes with videos and has been taught LMU numerous times already. The focus of these chapters in on introducing supervised learning, explaining the difference between regression and classification, showing how to evaluate and compare Machine Learning models and formalizing the concept of learning in general. When taking our DL4NLP course, you do not necessarily have to re-watch all of the videos if you already have proficient knowledge in this area. Nevertheless, all the explained concepts represent the basis which be build our course upon and thus we expect every student to be familiar with the content.
Chapter 1: Introduction to the course
In this chapter, you will dive into the fundamental principles of Deep Learning for Natural Language Processing (NLP). Explore key concepts including learning paradigms, various tasks within NLP, the neural probabilistic language model, and the significance of embeddings.
Chapter 2: Deep Learning Basics
In this chapter we explore fundamental concepts like Recurrent Neural Networks (RNNs), the attention mechanism, ELMo embeddings, and tokenization. Each concept serves as a building block in understanding how neural networks can comprehend and generate human language.
Chapter 3: Transformer
The Transformer, as introduced in [1], is a deep learning model architecture specifically designed for sequence-to-sequence tasks in natural language processing. It revolutionizes NLP by replacing recurrent layers with self-attention mechanisms, enabling it to process entire sequences in parallel, overcoming the limitations of sequential processing in traditional RNN-based models like LSTMs. This architecture has become the foundation for state-of-the-art models in various NLP tasks such as machine translation, text summarization, and language understanding. In this chapter we first introduce the transformer, explore different parts of it (Encoder and Decoder) and finally discuss ways to improve the architecture, such as Transformer-XL and Efficient Transformers.
Chapter 4: BERT
BERT (Bidirectional Encoder Representations from Transformers) [1] is a transformer-based model, designed to generate deep contextualized representations of words by considering bidirectional context, allowing it to capture complex linguistic patterns and context-dependent meanings. It achieves this by pretraining on large text corpora using masked language modeling and next sentence prediction objectives, enabling it to learn rich representations of words that incorporate both left and right context information.
Chapter 5: Post-BERT Era
Creating BERT-based models with modifications to pretraining involves adjusting the pretraining objectives or architecture to suit specific tasks or domains.This process typically begins by designing custom pre-training objectives or modifying existing ones to capture domain-specific characteristics or improve model performance on targeted tasks. These modified pre-training objectives can include variations of masked language modeling (MLM), next sentence prediction (NSP), or other self-supervised learning tasks tailored to the needs of the target domain. After pretraining, the model is fine-tuned on downstream tasks using task-specific data and objectives, enabling it to adapt its learned representations to the specific requirements of the tasks. In this chapter you will learn about three different cases where the existing BERT model has been modified, namely RoBERTa [1], ALBERT [2] and DistillBERT [3].
Chapter 6: Post-BERT Era II and using the Transformer
Here we further introduce models from the Post-BERT era, such as ELECTRA and XLNet. Additionally, we explore the concept of restructuring tasks into a text-to-text format and present the T5 model as a prime example.
Chapter 7: Generative Pre-Trained Transformers
In this chapter, you will learn about the evolution of the GPT series, spanning from GPT-1 to GPT-3, which revolutionizes natural language processing by employing generative transformer architectures pre-trained on massive text corpora to generate contextually relevant text. Additional Resources Overview of GPT-1 until GPT-3
Chapter 8: Decoding Strategies
This chapter is about various decoding strategies. You will learn about deterministic methods (greedy search, beam search, contrastive search, contrastive decoding) and stochastic methods (top-k, top-p, sampling with temperature). This chapter also covers evaluation metrics for open ended text generation.
Chapter 9: Large Language Models (LLMs)
In this chapter we cover LLM concepts, such as Instruction Fine-Tuning, Chain-of-Thought prompting and discuss the possbility of emerging abilities of LLMs.
Chapter 10: Reinforcement Learning from Human Feedback (RLHF)
In the context of natural language processing (NLP), RLHF (Reinforcement Learning from Human Feedback) involves training language models to generate text or perform tasks based on evaluative signals provided by human annotators or users. This technique allows NLP models to learn from human feedback, such as ratings or corrections, to improve their language understanding, generation, or task performance. By iteratively adjusting model parameters to maximize the reward signal derived from human feedback, RLHF enables models to adapt to specific preferences or requirements, leading to more accurate and contextually relevant outputs in various NLP applications.
Chapter 11: Training Large Language Models
In this chapter we cover multiple concepts that deal with training LLMs. You will learn about Transformer computation and scaling laws. In the second chapter we discuss how we can optimize LLM performance.
Chapter 12: Multilinguality
Multilinguality in NLP refers to the ability of models to understand and generate text across multiple languages, enabling more inclusive and versatile applications. This is achieved by training models on diverse multilingual datasets, allowing them to learn language-agnostic representations and transfer knowledge between languages. In this chapter you will learn about cross-lingual embeddings and multilingual transformers, which enable models to perform tasks like translation or text classification across different languages.