Gradient Descent with Momentum

Figure 1: Gradient Descent with momentum on a convex function

Figure 2: Gradient Descent with momentum on a non-convex function

We saw how we can use Gradient Descent to find minimum of a function. Gradient Descent with Momentum is a variation which can be useful in certain situations.

This algorithm is analogous to rolling a ball down a slope. Ball slowly accumulates momentum and rolls faster and faster.

Momentum helps in crossing some of the local minima if the function is not a convex function. This we can see in Figure 2.

Mathematically this algorithm can be written as: $$ v_{n+1} = \beta * v_n + \frac {d f(x)}{dx_n} $$ $$ x_{n+1} = x_n - \alpha * v_{n+1} $$

Here $\beta$ is momentum term and $\alpha$ is learning rate.

Momentum term $\beta$ is usually set to 0.9 or nearby value. If we set $\beta$ to 0, we arrive at gradient descent formula.

Gradient Descent with Momentum

Implementation of Gradient Descent with Momentum on convex function

Implementation of Gradient Descent with Momentum on non-convex function