AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss

YEOM JINSEOP·2023년 9월 1일
1

ML For 3D Data

목록 보기
18/27

🚀 Motivations

  • Co-segmentation is intrinsically contextual:
    how a shape is segmented can vary
    depending on the set it is in.
    ➡️ Paper's network features an adaptive learning module
    to produce a consistent shape segmentation which adapts to a set

  • Previous models on Deep Learning for shape segmentation,
    mostly trained to target a fixed set of semantic labels.
    Result in segmentation for a given shape is also fixed
    and can't be adaptive to the context of a shape set,
    a key feature of co-segemntation

🔑 Key Contribution

  • Fisrt DNN for adaptive shape co-segmentation

  • Novel and Effecitve group consistency loss
    based on low-rank approximations

  • co-segmentation training framework
    that needs no GT consistent segmentation labels.

⭐ Methods

Overview

  • Given an input set of unsegmented shapes,

1️⃣ First employ an "offline pre-trained" part prior network ➡️ to propse per-shape parts

2️⃣ Then, co-segmentation netowrk iteratively and jointly optimizes the part labelings
across the set subjected to a novel group consistency loss defined by matrix ranks.

  • AdaCoSeg: DNN for adaptive co-segmentation of a set of 3D shapes
    represented as point clouds
    📥 input: set of unsegmented shapes represented as point clouds
    📤 output: K-way consistent part labeling for the input set.
    without GT segmentations

Part prior network

"Learns to denoise an imperfectly segmented part"

📥 Input: 3D point cloud shape S
with noisy binary labeling,
where the FG represents an imperfect part

(Points belonging to the proposed part
constitute the FG (F ⊂ S),
while reamining points are BG (B = S \ F))

📤 Output: probability for each point (q ∈ S),
such that high probability points
collectively define the ideal, "clean" part
that best matches the proposed part,
thereby denoising the noisy foreground.

(= regluarized labeling leading to a refined part)

Architecture

MSG & MRG for producing
two context-sensitive 128-D feature vectors

  • MSG(multi-scale grouping):
    captures the context of a point at multiple scales,
    by concatenating feature
    over larger and larger nieghborhoods.

  • MRG(multi-resolution grouping):
    computes a similar multi-scale feature,
    but the feature of a large neighborhood are computed recursively,
    from the feature of the next smaller neighborhood

  • Paper average the MSG feature of FG points
    to obtain a robust descriptor
    which is concatenated with the MRG feature
    of each point
    to produce
    pairs.

  • Pairs are fed to a binary classifier with ReLU activation,
    where the output of the classifier
    indicates the "cleaned" FG and BG

🏋🏻‍♂️ Training Dataset: ComplementMe dataset(a subset of ShapeNet)

  • Trained to denoise binary labelings

Learns
1) what a valid part looks like through training
on a labeling denoising task

2) Multi-scale and Part-aware shape feature
at each point,
which can be used later in co-seg network.

Co-segmentation network

📥 input: set of unsegmented shapes represented as point clouds

📤 output: K-way consistent part labeling for the input set.

  • Learns the optimal network weights
    through back-propagation
    based on a group consistency loss
    defined over the input set.

  • For each part generated by the K-way classifiction,
    a binary segmentation is formed and fed into
    the pre-trianed part prior network
    (1) to compute a refined K-part segmentation
    (2) to extract a part-aware feature for each segment.
    ➡️ Theses together form a part feature for each segment.

  • Corresponding part features
    with the same label
    for all shapes in the set
    constitute a part feature matrix

  • Then, weights of the co-seg network
    are optimized with objective
    to maximize the part feature similarity
    within on label
    and minimize the similarity across different labels.

  • This amounts to minimizing
    the rank of the part feature matrix
    for each semantic label,
    while maximizing
    the rank of the joint part feature matrix
    for two semantic labels.

  • The runtime stage of paper's pipeline
    jointly segments a set of unsegmented test shapes T = {S1, S2, ..., SN}
    to maximize consistency
    between the segmented parts.

  • Outputs are compared across the test set
    to ensure geometric consistency of correspoinding segments:
    quantitative metric = group consistency energy,
    which is used as a loss function
    to iteratively refine the output of the network
    using back-propagation.

Architecture

  • Fisrt part: Classifier
    independently assigns one of K abstract labels {L1, L2, ..., Lk}
    to each point in each shape, with shared weights
    : the set of points in a shape with Label Li
    defines a single part with that label.

  • Since classifier output may be noisy,
    pass the binary FG/BG map corresponding to each such part
    through the pre-trained offline denoising network(part prior network)

  • Subsequent stages are deterministic and have no trainable parameters
    :used to compute the "group consisteny energy"(quantitative metric)

  • First, MSG features of the FG points for each part
    are max-pooled to yield a part descriptor.

➡️ If segmentation is consistent across shape,
all parts with a given label Li should have similar descriptors
✅ Therefore, stack the descriptors for all parts
with this label from all shapes
in a matrix Mi, one per row,
and try to minimize its second singular value,
a proxy for its rank (low rank == more consistent)

➡️ Parts with different lables shoud be distnct,
so union of the rows of matrices Mi and M(i!=j) shoud have high rank.
✅ want to maximize the second singular value of concat(Mi, Mj),
where concat function constructs a new matrix
with the union of the rows of its inputs.

The overall energy function is:
(where rank function is the second singular value, computed by a SVD decomposition)

As this energy is optimized by gradient descent,
the initial layers of the network learn to propose
more and more consistent segmentations
across the test dataset.

Additionally, paper found that
gaps between segments of a shape
appeared frequently and noticeably before re-composition,
and were resolved arbitrarily with the subsequent softmax
➡️ added a second energy term
that penalzes such gaps

👨🏻‍🔬 Experiments Results

Discriminative power of matrix ranks

  • To show matrix ranks provide a discriminative metric
  • Hypothesis: matrix rank makie it easy
    to distinguish betweem collections
    with few distinct lables, and collections with many distinc labels.
  • All part collections with more labels have a higher scores (Fig6. right)
  • Conclustion: Rank-based metric accurately reflects consistency of a part collection.

Control, adaptivity, and generalization

  • Fig7. ➡️ co-seg of the same shapes
    for different values of K.
    Method produces coarse-to-fine part hierarchies.
    🔥However, this nesting structure is not guaranteed by the method, leave this as future work.

  • Fig1. ➡️ Co-segmentations of two different chair collections, both with K = 4.
    The collection on the left has several chairs with arms:
    hench, the optimization detects arms
    as onf ot the prominent parts
    and groups all chair legs into a single segment.
    🆚 The other collection has no arms,
    hence the four part types are assigned to back,seat,front, and back legs.

Quantitative evalutation


  • Tab3. shows that AdaCoSeg can even
    improve the segmentation quality of its own training data.

  • Fig9. demonstrates a significant improvement
    by paper's co-seg over the noisy training data.

🌲 Limitations

  • Paper reiterate that online co-seg network does not generalize to new inputs, which is by design:
    the network weights are derived to minimize the loss function for the current input set
    and reomputed for each new set.

  • AdaCoSeg is not trained end-to-end,
    while an end-to-end deep co-seg netowork is desirable

  • Capable of handling some intra-category variations,
    but learning parts and their feature descriptions
    with all categories mixed together is more challenging.

  • Learned network weights cannot be continuously updated as new shape come tin.

0개의 댓글