Introduction
Intuition
A gradient is a vector that points to the steepest increase of the function (here ) at the given point
The additive inverse of the gradient vector points to the steepest decrease of the function at the given point
Math Definition
If is a function of two variables and , then the gradient of is the vector function defined by
Numeric Gradient
Strategy
Numeric gradient rely on the formula below
This strategy observes the loss change in small change of weight. Then, by dividing the small change in weight by the loss change, we can approximate
By doing the same thing times, which is the dimension of the weight, we can get the gradient
Advantages & Disadvantages
Advantages:
- Easy to implement
Disadvantages:
- Slow: ( is dimension of weight)
- This method “approximate” the gradient using numerical approximation rather than getting the actual value
Analytic Gradient
Strategy
We apply calculus rules to the loss function and receive . After that, we can put in the coordinates (pixel value of image) and get the accurate gradient
Sketch of Implementation
Advantages & Disadvantages
Advantages:
- Fast: By the method backpropagation
- Get the exact gradient value rather than approximate it
Disadvantage:
- Error-Prone: Hard to implement, often has bugs
In real world, we'll almost always use analytic gradient rather than numeric gradient
Gradient Check
We’ve mentioned that analytic gradient is error-prone and numeric gradient is easy to implement. Hence, we can use numeric gradient to check the correctness of analytic gradient. We call this step “gradient check”
To overcome the inefficiency of numeric gradient, in gradient check, we’ll instead use lower dimensional samples. This way, we can check analytic gradient’s correctness without wasting so much time
PyTorch offers
torch.autograd.gradcheckfunction for gradient check