Link Search Close Search Menu Expand Document

Gradient Descent Nesterov Momentum

Figure 1: Gradient Descent Nesterov Momentum



Nesterov momentum or Nesterov accelerated gradient (NAG) is an improvement over Gradient Descnent with momentum.

Mathematically this algorithm can be written as: $$ v_{n+1} = \beta * v_n + \frac {d f(x - \beta * v_n)}{dx_n} $$ $$ x_{n+1} = x_n - \alpha * v_{n+1} $$

Here $\beta$ is momentum term and $\alpha$ is learning rate. Momentum term $\beta$ is usually set to 0.9 or nearby value.

We can see in above animation that Nesterov momentum converges faster than Gradient Descent with Momentum.

For more details on Nesterov Momentum visit:

Stanford CS231n course

https://ruder.io/optimizing-gradient-descent/index.html#nesterovacceleratedgradient

Implementation of Gradient Descent Nesterov Momentum



Back to top

Copyright © 2020-2021 Gajanan Bhat. All rights reserved.