3D Representation

Depth Map

Depth map give the distance from the camera to the object

D-DL4CV-Lec17a-Depth_Map

Surface Normals

Surface normals define a vector perpendicular to the object’s surface for every pixel in the image

D-DL4CV-Lec17b-Surface_Normals

Voxel Grid

Representing 3D objects as small blocks

D-DL4CV-Lec17c-Voxel_Grid

Implicit Surface

We create a function that takes in the coordinates then output 1, 1/2, and 0.

1 means the coordinate is inside the object
0 means outside
1/2 means on the object

o : R^{3} \to {0, 1}

The object surface can be expressed as

{x : o (x) = \frac{1}{2}}

This function will be learned in training process

Implicit surface allows for multiscale outputs like Oct-Trees

Point Cloud

Just use points to depict the surface of the object

D-DL4CV-Lec17d-Point_Cloud

Triangle Mesh

Use points that are connected to each other to represent surface of objects

D-DL4CV-Lec17e-Triangle_Mesh

Shape Comparison Metrics

Intersection over Union (IoU)

Problems:

Struggles to capture thin structures
Cannot applied to point clouds since it doesn’t have volume
Small IoU difference doesn’t provide meaningful information
For triangle meshes, we need to first turn it into voxel grid before we can compute IoU

Chamfer Distance

Problem:

Very few badly placed points can dramatically skew the entire metric

F1 Score

Precision & Recall

Precision@ $t$ = fraction of predicted points within $t$ of some ground-truth point Recall@ $t$ = fraction of ground-truth points within $t$ of some predicted point

We compute the output $F 1@ t$ by

F 1@ t = 2 \cdot \frac{Precision@t \cdot Recall@t}{Precision@t + Recall@t}

F1 score is best shape prediction metric in common use

Camera System

Canonical Coordinates

Introduction

Definition: A fixed, standard coordinate system where objects are always oriented in the same way, regardless of how they appear in the input image.

Example: Regardless of how the chairs face in the image, we always predict it facing the same direction

Problem

Neural networks learn from associating input features with output predictions. However, when the spatial alignment is broken, it become harder for the network to learn consistent mappings

View Coordinates

Definition: A coordinate system aligned with the camera’s viewpoint - objects are oriented relative to how the camera sees them.

Datasets

ShapeNet

Cons:

Without context, isolated object

Pix3D

Pros:

Real images with context
Only 1 object per image

Mesh R-CNN

Motivation

Topology tells us we can’t create doughnut shape from ellipsoid. This becomes the restriction for Pixel2Mesh

Mesh R-CNN resolve this problem by changing the way we initialize the input of Pixel2Mesh

Implementation

Predict 3D objects with voxel grid
Sample on the surface of the object to create triangle mesh
Run Pixel2Mesh to get more accurate object

Chilfox

目錄

D-DL4CV-Lec17-3D_Vision

3D Representation

Depth Map

Surface Normals

Voxel Grid

Implicit Surface

Point Cloud

Triangle Mesh

Shape Comparison Metrics

Intersection over Union (IoU)

Chamfer Distance

F1 Score

Precision & Recall

Camera System

Canonical Coordinates

Introduction

Problem

View Coordinates

Datasets

ShapeNet

Pix3D

Mesh R-CNN

Motivation

Implementation

關係圖譜

反向連結