In this subsection, we discuss neural network architectures for multi-class classification, the softmax activation function as well as the softmax loss.