Gradient Descent

Figure 1: Gradient Descent on a convex function

Gradient Descent is an optimization algorithm. We can use this algorithm to find minimum of a function. This algorithm works iteratively by moving towards a lower point at every step. Once it find no lower points further it stops.

Mathematically this algorithm can be written as: $$ x_{n+1} = x_n - \alpha * \frac {d f(x)}{dx_n} $$

Here $\alpha$ is learning rate.

The value of $x_{n+1}$ is computed iteratively until it converges at a minimum.

Learning rate is a hyperparameter. Value of learning rate controls the steps size. It is important to have good value for learning rate, since having a very low value
may take a long time to converge and having a very high value may not converge at all. We can see the impact of learning rate on the algorithm in Gradient Descent Learning Rates

If the target function is not a convex function and contians local and global minima, gradient descent may converge at local minima instead of global, depending on the starting point.
There are variations of the algorithm which can work better especially in non-convex functions. See Gradient Descent with Momentum and Optimization with Momentum

Below program finds the minimum of a convex function using gradient descent from Tensorflow library.

Gradient Descent

Implementation of Gradient Descent