Introduction
Importance
AlexNet was introduced in 2012 ImageNet classification challenge. It is the very first time that researchers prove “deep CNN” can perform great on large-scale image recognition tasks.
Structure
- The input go through 5 convolutional layers
- The output was then flattened to destroy the spatial information
- At last, the output pass through 3 fully-connected layer and return the result
Resource Usage

Memory
Memory record the size of layer’s output
- Each Conv layers use the most memory because they process high-resolution feature map
- Memory decrease as we go deeper due to pooling operation
- FC layer has low memory usage since they work with flattened, low-dimensional vectors
Parameters
This record the number of learnable parameters
- The FC layer contain vast majority of parameters
- This is because FC layers connect every input to every output, creating massive weight matrices
- One filter will be used in every part of feature map in Conv layers, which infer they’re sharing weight, making it parameter-efficient
FLOP
This record number of “floating point operation”
- Most floating point operation occurs in Conv layers
- Even though FC layers have more parameters, they involve much simpler matrix multiplication