1. Object Detection Introduction
1.1 Task Definition
Input: Single RGB image
Output: A set of detected objects with:
- Category label (from fixed, known set of categories)
- Bounding boxes (x, y, width, height)
1.2 Challenges
- Multiple Output: The number of objects varies
- Multiple types of output: The task require model to not only detects “what” is the object but also “where” is the object
- Large Images: Images contains multiple objects tend to have high resolutions
2. Detecting a Single Object
D-DL4CV-Lec15a-Single_Object_Detection
3. Detecting Multiple Objects
3.1 Sliding Windows (Bad Approach)
D-DL4CV-Lec15b-Detecting_Multiple_Objects_with_Sliding_Windows
3.2 Region Proposals
3.2.1 Method
We create some algorithms that can find a small set of boxes that are likely to cover objects in the image, then we process classification on these boxes
3.2.1 R-CNN
- Intersection over Union (IoU)
- Non-Max Suppression (NMS)
- Mean Average Precision (mAP)
3.2.2 Fast R-CNN
- Rol Pooling
- Rol Align
3.2.3 Faster R-CNN
- Region Proposal Network (RPN)