PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes.
Thus, PointNet++ can be viewed as an extension of PointNet with added hierarchical structure.
PointNet++ leverages neighborhoods at multiple scales to achieve both robustnetss and detail capture.
Network learns to adaptively weight patterns detected at different scales and combine multi-scale features according to the input data.
PointNet++ is able to learn features even in non-uniformly sampled point sets.
Suppose X = (M, d) as discrete metric space, Where M is the set of points and d is th dinstance metric.
This Paper is interested in learning set functions f that take such X as the input and produce information of semantic interest regarding X.
In practice, such f can be classification function that assigns a label to x or a segmentation function that assigns a per point label to each member of M.
While PointNet uses a single max pooling operation to aggregate the whole point set,
this architecture builds a hierarchical grouping of points,
and progressively abstract larger and larger local regions along the hierarchy.
Hierarchical structure is composed by a number of set abstraction levels.
At each level, a set of points is processed and abstracted to produce a new set with fewer elements.
set abstraction level is made of 3 key layers:
1) Sampling layer,
2) Grouping layer and
3) PointNet layer.
Sampling layer: Selecets a set of points from input points, which defines the centroids of local regions.
Grouping layer: then constructs local region sets by finding "neighboring" points around the centroids.
PointNet layer: uses a mini-PointNet to encode region patterns into feature vectors.
input: N x (d + C) matrix
from N points, with d-dim coordinates, and C-dim point feature.
output: N' x (d + C') matrix
N' subsampled points, with d-dim coordinates, and new C'-dim feature vectors summarizing local context.
Sampling layer: Farthest Point Sampling(FPS)
Given input points {x1, x2, ..., xn},
this layer use iterative farthes point sampling(FPS) to choose a subset of points , such that xij is the most distant point (in metric distance) from the set {xi1, xi2, ... ,xi(j-1)} with regard to the rest points.
Performacne: better coverage of the entire point set given the same number of centroids, Compared with random sampling.
Grouping layer: Radius-based ball query
-input: a point set of size N x (d+ C), coordinates of a set of centroids of size (N' x d)
-output: groups of point sets of size N' x K x (d + C),
where each group correspoinds to a local region and K is the number of point in the neighborhood of centroid points.
PointNet layer
This paper uses PointNet as the basic building block for local pattern learning.
By using relative coordinatees together with point features, it can capture point-to-point relations in the local region.
-input: N' local regions of points with data size N' x K x (d + C)
-output: Each local region in the output is abstracted by its centroid and local feature that encodes the centriod's neighborhood. data size is N' x (d + C')
Challenge for point set feature learning
Solution of this Paper
This paper propses PointNet++ for processing point sets sampled in a metric space.
It recursively functions on a nested partitioning of the input point set, and is effective in learning hierarchical features w.s.t the distance metric.
To handle non uniform point sampling issue, This paper proposes two set abstraction layers that aggregate multi-scale information according to local point densities.