PointGroup- Dual-Set Point Grouping for 3D Instance Segmentation

YEOM JINSEOPΒ·2023λ…„ 9μ›” 7일
1

ML For 3D Data

λͺ©λ‘ 보기
22/27

πŸš€ Motivations

  • CNN has boosted the performance of 2D segmentation.
    However, given unordered and unstructured 3D point clouds,
    2D methods cannot be directly extended to 3D points.

  • ➑️ Paper designs bottom-up end-to-end framework, PointGroup
    for 3D instance segmentation,
    with the key target of better grouping of points.

πŸ”‘ Key Contributions

  • Paper presents PointGroup,
    a new end-to-end bottom-up architecture,
    specifically focused on better groupint the points
    by exploring the void space between objects,
    to deal with the challengin 3D instance segmentation task.

  • Proposes a point clustering method
    based on dual coordinate sets,
    i.e., the original and shifted sets.
    Along with the new ScoreNet,
    object instances can be better segmented out.

⭐ Methods

Overview

  • PointGroup with the key target of better griuping points.

  • Main two problems to deal with
    1) seperate the contents in 3D space into individual objces
    2) determine the semantic label of each object.

  • Backbone Network
    Designed a two-branch network
    to extract point features (by semantic seg backbone)
    and predict semantic labels and offsets(by offset branch),
    for shifting each point
    toward its respective instance centroid.

  • Clustering
    Adopt effective algorithm to group points into clusters.
    For each point, take its coordinates as a reference,
    group it with nearby points of the same label,
    and expand the group progressively.

  • Consider two coordinate sets in two separate passes (called "Dual-Set Point Grouping")
    1) original point positions
    2) those shifted by the prdicted offsets.

  • ScoreNet
    Formulate the ScoreNet
    to evaluate and pick candidate groups,
    followed by the NMS(Non-Maximum Suppressio)
    to remove duplicate predictions.

Backbone Network

  • Input: Point set P of N points.
    Each points has a color fi = (ri, gi, bi)
    and 3D coordinates pi = (xi, yi, zi)
    (where i ∈ {1, ..., N}

  • Backbone extracts point-wise feature Fi for each point.
    : output feature of the backbos as F = {Fi} ∈ ℝ^(N x K)
    (where K: # of channels)

  • Paper's Implementation
    1) Voxelize the points and construct a U-Net
    with SSC(Submanifold Sparse Convolution) and SC(Sparse Convolution)
    2) Then, Recover points from voxels
    to obtain the point-wise features.

  • Contextual and geotetric information
    is well extracted by U-Net,
    which provieds discriminative point-wise featrues F

Two branches

  • Feed feature F into two brancehs
    1) one for semantic segmentation
    2) other for predicting a per-point offset vector
    to shift each point towards the centroid of its respective object instance.
    (si: predicted semantic label)
    oi = (Ξ”xi, Ξ”yi, Ξ”zi) : offeset vector of point i)
Semantic Segmentation Branch
  • Apply an MLP to F
    to prodce semantic scores SC = {sc1, ..., scN} ∈ R^(NxNclass)
    for the N points over the Nclass classes.

  • Regularize the results
    by a cross entropy loss Lsem.

  • Predicted semantic label si for point i
    is the class with the maximum score,
    i.e., si = argmax(sci)

Offset Prediction Branch
  • Encodes F
    to produce N offset vectors O = {o1, ..., oN} ∈ ℝ^(Nx3)
    for the N points.

  • For points belonging to the same instance,
    constrain their learned offsets by an L1 regression loss as:

  • Paper finds it hard to regress precise offsets,
    particulary for boundary points of large-size objcets,
    since these points are relatively far from the instance centroids.

  • To address this issue,
    formulat a direction loss
    to constrain the direction of predicted offset vectors.

Clustering Algorithm

  • After obtaining the semantic labels,
    begin to group points
    into instance clusters
    based on the void space betwwen objects.

  • Clustering method to group poitns
    close to each other
    into same cluster,
    if they have the same semantic label.

  • However, clustering directly based on
    the point coordinate set P = {pi}
    may fail to separate same category objects
    that are close to each other in 3D space
    and mis-group them.

  • Thus, use learned offset oi
    1) to shift point i
    towards its respective instance centroid
    2) obtain shifted coordinates qi = pi + oi ∈ ℝ^3

  • For points belonging to the same object instance,
    different from pi,
    shited coordinates qi clutter around the same centroid.

  • So by clustering based on shifted coorindate set Q = {qi},
    separate nearby objects better,
    even they have same semantic labels.

  • However, for points near object boundary,
    prdicted offset may not be accurate.
    So, clustering algorithm employs
    "Dual" point coordinate sets,
    (original coord P and shifted coord Q).

  • clustering reulst C as C^p U C^q
    (clusters discovered based on P and Q)

  • Core step of algoriths is that
    1) for point i,
    get points within the ball of radius r
    centerd at xi
    2) and group points with same semantic labels
    as point i into same cluster.
    (r serves as spatial constraint in the clustering,
    so that two intra-category objects
    at a distance larger than r are not grouped.
    )

  • Use BFS to group points of the same instance into a cluster.

ScoreNet

  • ScoreNet
    1) to process the proposed point clusters C = C^p U C^q
    2) produce a score per cluster proposal.

  • NMS is applied to these proposals
    with the scores
    to generate final instance prediction.
    (G: instance prediction = {G1, ..., GMpred} βŠ† C
    I: GT instances = {I1, ..., IMgt})

  • Input: set of candidate clusters C = {C1, ..., CM}
    (M: # of candidate clusters,
    Ci: i-th cluster,
    Ni: # of points in Ci
    )

  • Goal of ScoreNet
    : to predict a score for each cluster
    to indicate the quality of the associated cluster poposal,
    for precisely reserving the better clusters in NMS
    and thus combine strength of C^p and C^q.

  • For each cluster,
    1) gather the point features from F ∈ R^(N x K)
    (features extracted by the backbone)
    2) and form

    where h maps the point index in Ci
    to corresponding point index in P.
    Similarly, Express coordinates for points in Ci as

  • To better aggregate the cluster info,
    take Fci and Pci as the initial features and coordinates,
    and voxelize the clusters.

  • Feature for each voxel is average-pooled
    from the initial features of points
    in that voxel.

  • Then feed them into a small U-Net with SSC and SC
    to further eoncode the features.

  • Cluster-aware max-pooling is then followed
    to produce a single cluster feature vector

  • Final cluster scores

Network Training and Inference

  • Inference
    perform NMS on clusters C
    with predicted scores Sc
    to obtain the final instance predictions G βŠ† C.

  • IoU threshold is empirically set as 0.3

πŸ‘¨πŸ»β€πŸ”¬ Experimental Results

  • Datasets
    ScanNet v2: 1613 sacns with 3D object instance annotations.
    S3DIS: 3D scans across six areas with 271 scenes in total.
    Each point is assigned on label out of 13 semantic classes.

  • Evaluation Metrics
    mAP(mean average percision)

Ablation study

Clustering based on Different Coordinate Sets

Ablation on Clustering Radius r

  • small r is sensitive to point densitiy,
    large r increase the risk of grouping two nearby same0class objects into one.

Quantitative Results

Qualitative Results

βœ… Conclusion

  • Propsed PointGroup for 3D instance segmentation,
    with a focus of
    1) better grouping points by
    exploring the in-between space
    2) and point semantic labels
    among the object instances.

  • Considering situation
    two intra-category objets
    may be very close to each other,
    Paper designs a two-branch network
    to respectively learn a per-point semantic lable
    and per-point offset vector
    for moving each point towards its instance centroid.

  • Cluster points based on both
    1) original point coordinates
    2) offset-shifted point coordinates
    and combines strength of two coord sets
    to optimize point grouping precision.

  • Introduced ScoreNet
    to learn to evalutate the generated candidate clusteres,

  • Followed by the NMS
    to avoid duplicates
    before output the final predicted instances.

πŸ›©οΈ Future work

  • Introduce a progressive refinement module
    to relieve the semanic inaccuracy problem
    that affects the instance grouping

  • Explore the possibility of
    incorporating weakly- or self-supervision techinques
    to boost the performance.

0개의 λŒ“κΈ€