Previous few-shot segmentation method do not differentiate the feature extraction of the target object in the support set and segmentation process of query image, which may be problematic since the segmentation model representation is mixed with the feature extracted from support set.
This paper propose the model that is separated as prototype extraction and non-parametric learning. Moreover, the paper attempted to use mask for few shot learning process.
2. Method
Overall structure
2.1. Prototype learning
In proposed model, it leverages the mask annotations over the support images to learn prototypes for foreground and background separately.
Before leveraging, there are two strategies to exploit the segmentations masks. Early fusion masks the support image before feature extration. Otherwize, late fusion masks the feature map to treate foreground and background differently. The paper used the late fusion.
Equation (1) means the prototype of foreground and (2) means the prototype of back ground.
2.2. Non-parametric metric learning
To segment the query image, first the model calculates the distance between query feature vector at each spatial location with each computed prototype at 2.1. Second, softmax is applied over the distance to produce proability map. Cosine similarity is applied for calculating distance.
2.3. Prototype alignment regularization (PAR)
This module make it easy to extract general feature from the support set image to guide the FSS.
If the model can predict a good segmentation mask, then prototypes learned from query set should be able to segment support set images. Thus, PAR takes the query and predicted mask as the new support set and treats the old support set as the new query image as Figure 2 (b).