Machine learning and stochastic gradient descent

I very briefly describe gradient descent (GD) and how it is used in the context of machine learning: the stochastic gradient descent algorithm (SGD). The idea is simple: divide your data into randomly chosen mini-batches and use a mini-batch to estimate the gradient of your cost function. Use that to do GD iterations at fixed mini-batch, then go over all your mini-batches, and that defines an epoch. Go over many epochs, and your network will hopefully learn!

Video. Notes.