R-CNN
Introduction
R-CNN is a kind of multiple objects detection method that leverage the concept of “region proposals”
Steps
Step 1: Region Proposals
Run region proposal method to compute ~2000 candidates
Step 2: Make Predictions
Resize each region to size of CNN’s input, then pass them through CNN to predict class scores and bounded box transform
Step 3: Select Output
In this step we select a subset of region proposals to output based on the predict scores and our choice of choosing method
Step 4: Compare boxes
We use Intersection over Union (IoU) to compare our prediction with the ground-truth box which is drawn manually

Bounded Box Transform Prediction
Explanation
This part of the output predicts the “transform” to the box in order to correct the box size and position
A Common Practice
We predict the “Transform”
Region Proposal: Transform: Output Box:
Translate relative to the box:
Log-space scale transform:
Overlapping Boxes
Problem
Object detectors often output many bounding boxes that detect the same object multiple time
Solution: Non-Max Suppression (NMS)
Evaluating Object Detectors
Problem
Object detection is very different from image classification, thus we need to come up with a new way evaluating how good our object detector is
Solution: Mean Average Precision (mAP)
Problem of R-CNN
It is very slow since we need to forward pass for each image