1 Semantic Segmentation
1.1 Task Description
We want to label each pixel in the image with a category. Currently we don’t care about classifying different objects of the same label
1.2 Solutions
1.2.1 Sliding Windows
For each pixel we crop a small patch around it, pass it into CNN, and label the pixel with the output of CNN. However, this approach is too slow since we need to use the CNN for every pixel
1.2.2 Fully Convolutional Network
D-DL4CV-Lec16a-Semantic_Segmentation_Fully_Convolutional_Network
2 Things & Stuffs
2.1 Things
Things are object categories that can be separated into object instances
For example: cats, cars, people, …
2.2 Stuffs
Stuffs are object categories that cannot be separated into instances
For example: sky, grass, water, …
3 Instance Segmentation
3.1 Task Description
We extend the task of semantic segmentation. Now other than classifying all pixels of “cows” as “cows”, I also want to identify “cow 1” and “cow 2”
Instance segmentation handle “Things” but not “Stuffs”
3.2 Mask R-CNN
4 Beyond Instance Segmentation
4.1 Panoptic Segmentation
Panoptic segmentation is similar to instance segmentation, but it also handles “Stuffs” other than “Things”
4.2 Human Keypoints
Representing the pose of human by detecting a set of keypoints on the human body
For example: nose, eyes, shoulders, elbows, …
Its implementation is similar to mask R-CNN, it also create a new branch in each region proposal of fast R-CNN