In this subchapter we take a look at stochastic gradient descent and discuss its stochastic and convergence behaviour. Furthermore, we look into the effect of the batch size.