Introduction

Intuition

A gradient is a vector that points to the steepest increase of the function (here ) at the given point

The additive inverse of the gradient vector points to the steepest decrease of the function at the given point

Math Definition

If is a function of two variables and , then the gradient of is the vector function defined by


Numeric Gradient

Strategy

Numeric gradient rely on the formula below

This strategy observes the loss change in small change of weight. Then, by dividing the small change in weight by the loss change, we can approximate

By doing the same thing times, which is the dimension of the weight, we can get the gradient

Advantages & Disadvantages

Advantages:

  1. Easy to implement

Disadvantages:

  1. Slow: ( is dimension of weight)
  2. This method “approximate” the gradient using numerical approximation rather than getting the actual value

Analytic Gradient

Strategy

We apply calculus rules to the loss function and receive . After that, we can put in the coordinates (pixel value of image) and get the accurate gradient

Sketch of Implementation

Advantages & Disadvantages

Advantages:

  1. Fast: By the method backpropagation
  2. Get the exact gradient value rather than approximate it

Disadvantage:

  1. Error-Prone: Hard to implement, often has bugs

In real world, we'll almost always use analytic gradient rather than numeric gradient


Gradient Check

We’ve mentioned that analytic gradient is error-prone and numeric gradient is easy to implement. Hence, we can use numeric gradient to check the correctness of analytic gradient. We call this step “gradient check”

To overcome the inefficiency of numeric gradient, in gradient check, we’ll instead use lower dimensional samples. This way, we can check analytic gradient’s correctness without wasting so much time

PyTorch offers torch.autograd.gradcheck function for gradient check