Introduction to Machine Learning (I2ML) | Topic 07: Deep Recurrent Neural Networks

This chapter introduces Recurrent Neural Networks (RNNs), designed to process sequential data by retaining information over time. It covers the backpropagation through time (BPTT) algorithm for training RNNs, highlighting key challenges like exploding and vanishing gradients. To address these issues, Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU) are introduced as enhanced architectures with gating mechanisms that better manage information flow. In addition, the chapter briefly introduces more recent approaches for modelling sequence data such as attention and transformers.

Chapter 07.01: Introduction to RNNs
Recurrent neural networks (RNN) are the state of the art algorithm for sequential data and are used by Apple’s Siri and Google’s voice search. It is the first algorithm that remembers its input, due to an internal memory, which makes it perfectly suited for machine learning problems that involve sequential data.
Chapter 07.02: Backpropogation in RNNs
In this section we explain backpropagation for RNN as well as the phenomenon of exploding and vanishing gradients.
Chapter 07.03: Modern RNNs
In this subchapter, we explain how modern RNN such as LSTM, GRU, and Bidirectional RNNs address the problem of exploding and vanishing gradients.
Chapter 07.04: Applications of RNNs
This subsection focuses on common applications of RNNs, e.g. in the context of large language modelling or encoder-decoder architectures.
Chapter 07.05: Attention and Transformers
In this subchapter, we introduce more recent sequence data modelling techniques such as attention and transformers.