Chapters
-
Topic 1: Introduction
In this chapter, we give brief introduction about representation learning, single neurons, the XOR problem and single, hidden layers as well as multi-layer neural networks. Moreover, we discuss multi-class classification, matrix notation and universal approximation.
-
Topic 2: Optimization - Part I
This chapter introduces foundational concepts in training neural networks, including backpropagation and computational graphs, which provide a structured approach for calculating gradients for optimizing model parameters. It also discusses gradient descent and stochastic gradient descent and touches on strategies for effective learning, such as learning rate selection and weight initialization.
-
Topic 03: Regularization
This chapter introduces and discusses regularization techniques for neural networks, which help prevent overfitting and improve generalization. It provides an introduction and geometric intuition of L2-regularization. Additionally, it introduces dropout, a method of randomly deactivating neurons during training to enhance robustness, and early stopping, which monitors validation performance to halt training when overfitting begins.
-
Topic 04: Optimization - Part II
This chapter explores advanced topics in neural network optimization, covering key challenges in training stability, initialization techniques, momentum and adaptive learning rates, and activation functions. Addressing issues like ill-conditioning, local minima, and exploding gradients, the chapter discusses methods to improve model convergence and performance, including practical weight initializations, learning rate schedules, and specialized activations for hidden and output layers to overcome common obstacles in deep learning.
-
Topic 05: Convolutional Neural Networks - Part I
This chapter introduce –convolutional neural networks (CNNs)– one of the most popular component of deep learning architecture. CNNs are widely applied in all type of domains such as natural language processing (NLP), audio, and time-series data. In this part, we introduce the CNNs, properties and components of CNN, differences between CNN and FCN as well as math behind the CNNs.
-
Topic 06: Convolutional Neural Networks - Part II
This section introduces 1D, 2D, and 3D convolutions highlighting how CNNs can adapt to sequential, spatial, and volumetric data, thus expanding their use across diverse applications. Additionally, we explore advanced CNN techniques for enhancing feature extraction, including dilated and transposed convolutions, used to expand receptive fields and upsample outputs. Furthermore, we dive into separable convolutions that improve computational efficiency by decomposing operations.
-
Topic 07: Deep Recurrent Neural Networks
This chapter introduces Recurrent Neural Networks (RNNs), designed to process sequential data by retaining information over time. It covers the backpropagation through time (BPTT) algorithm for training RNNs, highlighting key challenges like exploding and vanishing gradients. To address these issues, Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU) are introduced as enhanced architectures with gating mechanisms that better manage information flow. In addition, the chapter briefly introduces more recent approaches for modelling sequence data such as attention and transformers.
-
Chapter 08: Autoencoders
This chapter introduces unsupervised learning with a focus on autoencoders (AEs), which learn compact representations of data without labeled outputs. It explains the structure of AEs, including encoder-decoder frameworks and their use in dimensionality reduction and feature extraction. The chapter also explores regularized variants such as overcomplete, sparse, denoising, and contractive AEs, highlighting their unique roles in improving representation quality. Finally, it covers convolutional AEs for image data and manifold learning concepts.
-
Topic 09: Generative Adversarial Neural Networks
Generative Adversarial Networks (GANs) are a class of machine learning models that consist of two competing networks: a generator, which creates data samples, and a discriminator, which evaluates their authenticity. This chapter introduces the core principles of GANs, explores popular variants and addresses challenges in optimizing them.