Introduction

Intuition

A gradient is a vector that points to the steepest increase of the function (here $L (w)$ ) at the given point

The additive inverse of the gradient vector points to the steepest decrease of the function at the given point

Math Definition

If $f$ is a function of two variables $x$ and $y$ , then the gradient of $f$ is the vector function $\nabla f$ defined by

\nabla f (x, y) = ⟨ f_{x} (x, y), f_{y} (x, y)⟩ = \frac{\partial f}{\partial x} \hat{i} + \frac{\partial f}{\partial y} \hat{j}

Numeric Gradient

Strategy

Numeric gradient rely on the formula below

\frac{\partial f}{\partial x} = h \to 0 lim \frac{f ( x + h ) - f ( x )}{h}

This strategy observes the loss change in small change of weight. Then, by dividing the small change in weight by the loss change, we can approximate $\frac{\partial f}{\partial x}$

By doing the same thing $n$ times, which is the dimension of the weight, we can get the gradient $\frac{d L}{d W}$

Advantages & Disadvantages

Advantages:

Easy to implement

Disadvantages:

Slow: $O (n)$ ( $n$ is dimension of weight)
This method “approximate” the gradient using numerical approximation rather than getting the actual value

Analytic Gradient

Strategy

We apply calculus rules to the loss function and receive $\partial L / \partial W = \nabla_{W} L (W)$ . After that, we can put in the coordinates (pixel value of image) and get the accurate gradient

Sketch of Implementation

L (W) = \frac{1}{N} i = 1 \sum N L_{i} (x_{i}, y_{i}, W) + λ R (W)

Gradient = \nabla_{W} L (W) = \frac{1}{N} i = 1 \sum N \nabla_{W} L_{i} (x_{i}, y_{i}, W) + \nabla_{W} R (W)

Advantages & Disadvantages

Advantages:

Fast: By the method backpropagation
Get the exact gradient value rather than approximate it

Disadvantage:

Error-Prone: Hard to implement, often has bugs

In real world, we'll almost always use analytic gradient rather than numeric gradient

Gradient Check

We’ve mentioned that analytic gradient is error-prone and numeric gradient is easy to implement. Hence, we can use numeric gradient to check the correctness of analytic gradient. We call this step “gradient check”

To overcome the inefficiency of numeric gradient, in gradient check, we’ll instead use lower dimensional samples. This way, we can check analytic gradient’s correctness without wasting so much time

PyTorch offers torch.autograd.gradcheck function for gradient check

Chilfox

目錄

D-DL4CV-Lec04a-Gradient

Introduction

Intuition

Math Definition

Numeric Gradient

Strategy

Advantages & Disadvantages

Analytic Gradient

Strategy

Sketch of Implementation

Advantages & Disadvantages

Gradient Check

關係圖譜

反向連結