What is Point Cloud?

Introduction

We depict the surface of the object with points, and memorize the coordinates of the points

Pros:

Cons:

Point clouds are just scattered points, they don’t contain information about
- Which points are connected to each other
- What’s the actual surface look like

First, we process every coordinate one by one with the same fully-connected network
Next, we do max pooling on the outputs
Eventually, we pass the feature vector into fully-connected network and output class scores

Processing the coordinates one by one can avoid the order of the coordinate affect the result

We use 2D CNN to extract feature from input image

We straighten the image features and send it into fully connected network to predict fix number of points ( $P_{1}$ )
Fully connected network is good at learning the overall structure of the object, but bad at predicting the detail of the object

We send image features into 2D CNN to predict $P_{2}$ points for every spatial position ( $H^{'} \times W^{'}$ )
Convolutional layer, on the other hand, is good at predicting details of the object

Finally, we combine the points predicted by fully connected network and CNN to get the final output

We need a way to compare the point clouds as sets. That is, we don’t want the order we memorize the coordinate affect the final result

d_{C D} (S_{1}, S_{2}) = x \in S_{1} \sum y \in S_{2} min ∣ x - y ∣_{2}^{2} + y \in S_{2} \sum x \in S_{1} min ∣ x - y ∣_{2}^{2}

subscript 2 means square root

i.e. $∣ x - y ∣_{2}^{2} = (x - y)^{2}$