Gradient Descent Learning Rates

Figure 1: Gradient descent with different learning rates

Gradient Descent is defined mathematically as:

$$ x_{n+1} = x_n - \alpha * \frac {d f(x)}{dx_n} $$

Where $\alpha$ is learning rate. Learning rate is a hyperparameter. Value of learning rate controls the steps size. Having a good learning rate is important because very low value of learning rate may take a long time to converge and having a very high value may not converge at all.

In Figure 1 we can see the impact of different learning rates on gradient descent.

Below program shows different learning rates for gradient descent using Tensorflow library.

Gradient Descent Learning Rates

Implementation of Gradient Descent Learning Rates