3D Representation
Depth Map
Depth map give the distance from the camera to the object
Surface Normals
Surface normals define a vector perpendicular to the object’s surface for every pixel in the image
Voxel Grid
Representing 3D objects as small blocks
Implicit Surface
We create a function that takes in the coordinates then output 1, 1/2, and 0.
- 1 means the coordinate is inside the object
- 0 means outside
- 1/2 means on the object
The object surface can be expressed as
This function will be learned in training process
Implicit surface allows for multiscale outputs like Oct-Trees
Point Cloud
Just use points to depict the surface of the object
Triangle Mesh
Use points that are connected to each other to represent surface of objects
Shape Comparison Metrics
Intersection over Union (IoU)
Problems:
- Struggles to capture thin structures
- Cannot applied to point clouds since it doesn’t have volume
- Small IoU difference doesn’t provide meaningful information
- For triangle meshes, we need to first turn it into voxel grid before we can compute IoU
Chamfer Distance
Problem:
- Very few badly placed points can dramatically skew the entire metric
F1 Score
Precision & Recall
Precision@ = fraction of predicted points within of some ground-truth point Recall@ = fraction of ground-truth points within of some predicted point
We compute the output by
F1 score is best shape prediction metric in common use
Camera System
Canonical Coordinates
Introduction
Definition: A fixed, standard coordinate system where objects are always oriented in the same way, regardless of how they appear in the input image.
Example: Regardless of how the chairs face in the image, we always predict it facing the same direction
Problem
Neural networks learn from associating input features with output predictions. However, when the spatial alignment is broken, it become harder for the network to learn consistent mappings
View Coordinates
Definition: A coordinate system aligned with the camera’s viewpoint - objects are oriented relative to how the camera sees them.
Datasets
ShapeNet
Cons:
- Without context, isolated object
Pix3D
Pros:
- Real images with context
- Only 1 object per image
Mesh R-CNN
Motivation
Topology tells us we can’t create doughnut shape from ellipsoid. This becomes the restriction for Pixel2Mesh
Mesh R-CNN resolve this problem by changing the way we initialize the input of Pixel2Mesh
Implementation
- Predict 3D objects with voxel grid
- Sample on the surface of the object to create triangle mesh
- Run Pixel2Mesh to get more accurate object