[WIP] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Estelle Yoon·2025년 3월 18일

WIP

목록 보기
10/13

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Date: 2017
Journal: CVPR

1. Introduction

Point clouds or meshes are not in a regular format, This cause the need for transformation to 3D voxel grids or collection of images

This data representation transformation renders the resulting data unnecessarily voluminous

PointNets simply use point clouds

As point cloud is just a set of points, basic architecture is simple at the initial stages each point is processed identically and independently

PointNet s trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks

2. Related Work

Point Cloud Feature

Point Feature encode certain statistical transformation, typically classified, and also be categorized as local and global features

Deep Learning on 3D Data

Deep Learning on Unordered Sets

One recent work used a read process write network with attention mechanism to consume unordered input sets

3. Problem Statement

For object classification task, the input point cloud is either directly sampled from a shape or pre-segmented from a scene point cloud

4. Deep Learning on Point Sets

4.1. Properties of Point Sets in Rn\R ^n

Input is a subset of points from an Euclidean space

Unordered, interaction among nearby points, invariant to certain transformation for learned representation of the point set

4.2. PointNet Architecture

Three key modules, max pooling layer as a symmetric function to aggregate information, local and global information combination structure, two joint alignment networks

Symmetry Function for Unordered Input

  • Sort input into a canonical order

Sorting does not fully resolve the ordering issue

MLP performs better with unsorted point set

  • Treat input as a sequence to train RNN

Using randomly permuted sequences, RNN become invariant to input order

However when it comes to RNN, order does matter and cannot be totally omitted

  • Simple symmetric function to aggregate information from each points

Approximate a general function defined on a point set by applying a symmetric function on fransformed elements in the set

Due to simplicity of our module, theoretical analysis were possible

Local and Global Information Aggregation

Point segmentation requires a combination of local and global knowledge

After computing the global point cloud feature vector, feed it back to per point feature

Extract new per point features based on the combined point features

Joint Alignment Network

Predict affine transformation matrix by a mini network and directly apply this transformation to coordinates of input points

The mini network itself resembles big network and is composed by basic modules of point independent feature extraction, max pooling and fully connected layers

Transformation matrix in the feature space has much higher dimension than the spatial transform matrix

Therefore add a regularization term to our softmax training loss

4.2. Theoretical Analysis

Universal approximation

Ability of neural network to continuous set functions

Given enough neurons at max pooling layer

Theorem 1.

Suppose f:χ Rf : \chi ~ \rarr \R is a continuous set function with reference to Hausdorff distance

In the worst case the network can learn to convert a point cloud into a volumetric representation by partitioning the space into equal sized voxels

Bottleneck dimension and stability

Expressiveness of network is strongly affected by the dimension of the max pooling layer

Defined sub network of ff which maps a point set in [0, 1]m[0, ~ 1] ^m to a KK dimensional vector

Theorem 2.

Following proposed formula

Extra noise points up to NS\mathcal{N}_S

Robustness is gained in analogy to sparsity principle

Intuitively network learns to summarize a shape by a sparse set of key points

profile
Studying

0개의 댓글