Figure 1: Gradient Descent and RMSProp algorithms with and without momentum
- We saw an example of Gradient Descent with Optimization. Another algorithm which supports momentum optimization is RMSProp (Root Mean Square Propagation).
- In this example we will use both the algorithms with optimization to find minimum of a non-convex function.
- We can see that algorithms with momentum are able to find the global minima whereas without momentum are reaching only local minima.
- We can also note that Gradient Descent with momentum is converging faster.
- Below program shows usage of SGD and RMSProp with and without momentum.