Single Object Detection
Steps
Step 1: Pass the image into a pretrained CNN
This step uses the technique of transfer learning, the CNN is often pretrained on ImageNet
Step 2: Predict
After flattening the Conv outputs, we use two parallel FC layers instead of one, one of them predicts the category of the object, the other one predicts the position of the object
Category Prediction: Same as original CNN Position Prediction: Predict the box enclosing the object, i.e.,
Step 3: Optimization
- We compute the loss of two FC layers separately
- Next, we combine them with weighted sum to get final loss
- Use backpropagation to update learnable parameters

Comment
This method works pretty well on single object detection. However, this method can’t work for multiple objects detection