Figure 1: Gradient Descent Nesterov Momentum
Nesterov momentum or Nesterov accelerated gradient (NAG) is an improvement over Gradient Descnent with momentum.
Mathematically this algorithm can be written as: $$ v_{n+1} = \beta * v_n + \frac {d f(x - \beta * v_n)}{dx_n} $$ $$ x_{n+1} = x_n - \alpha * v_{n+1} $$
Here $\beta$ is momentum term and $\alpha$ is learning rate. Momentum term $\beta$ is usually set to 0.9 or nearby value.
We can see in above animation that Nesterov momentum converges faster than Gradient Descent with Momentum.
For more details on Nesterov Momentum visit:
https://ruder.io/optimizing-gradient-descent/index.html#nesterovacceleratedgradient