SGD + Momentum

Strategy

Add momentum to original SGD which “memorize” the past steps the algorithm get Use “velocity” term to record past gradients Use “friction” $ρ$ term to reduce influence of ancient gradient to the current step

Mathematical Expression

v_{t + 1} x_{t + 1} = ρ v_{t} + \nabla f (x_{t}) = x_{t} - α v_{t + 1}

Implementation

# SGD + Momentum
v = 0
for t in range(num_steps):
	dw = compute_gradient(w)
	v = rho * v + dw
	w -= learning_rate * v

There are different way to implement SGD+Momentum, but they'll give the same sequence of $x_{t}$

Resolved Problem

When reach shallow dimensions, the “velocity” makes it remain reasonable speed
When enter steep landscape, overshooting create negative momentum to the speed, making next step smaller, which resolve oscillating problem
When reaching saddle point or local minimum, remaining “velocity” allow us to escape it

New Problem

SGD+Momentum determine the direction to go ( $v_{t + 1} = ρ v_{t} + \nabla f (x_{t})$ ) by

The past steps $v_{t}$
The place you currently are $x_{t}$ However, in our intuition, we should consider $x_{t + 1}$ instead of $x_{t}$ when computing $v_{t + 1}$ Nesterov Momentum we’ll introduce later will solve this problem

Chilfox

目錄

D-DL4CV-Lec04bb-SGD_Momentum

SGD + Momentum

Strategy

Mathematical Expression

Implementation

Resolved Problem

New Problem

關係圖譜

反向連結