CSPNet 논문에 나오길래 무슨 뜻인가 봤는데("In ResNeXt [39], Xie et al. first demonstrate that cardinality can be more effective than the dimensions of width and depth.")
p 1. "CNNs require a fixed input image size (e.g., 224 224), which limits both the aspect ratio and the scale of the input image"
p 1. "In fact, convolutional layers do not require a fixed image size and can generate feature maps of any sizes. On the other hand, the fully-connected layers need to have fixedsize/length input by their definition. Hence, the fixedsize constraint comes only from the fully-connected layers, which exist at a deeper stage of the network."
제시하는 해결 방법
p 1. "Specifically, we add an SPP layer on top of the last convolutional layer. The SPP layer pools the features and generates fixedlength outputs, which are then fed into the fullyconnected layers (or other classifiers)"
-
SPP layer
p 3. "Spatial pyramid pooling [14], [15] improves BoW in that it can maintain spatial information by pooling in local spatial bins. These spatial bins have sizes proportional to the image size, so the number of bins is fixed regardless of the image size"
p 3. "To adopt the deep network for images of arbitrary sizes, we replace the last pooling layer (e.g., pool5, after the last convolutional layer) with a spatial pyramid pooling layer. Figure 3 illustrates our method. In each spatial bin, we pool the responses of each filter (throughout this paper we use max pooling). The outputs of the spatial pyramid pooling are kM-dimensional vectors with the number of bins denoted as M (k is the number of filters in the last convolutional layer). The fixed-dimensional vectors are the input to the fully-connected layer."
어떻게 해서 fix-sized length vector가 탄생할까?
p 4. "For the network to accept 180 180 inputs, we implement another fixed-size-input (180 180) network. The feature map size after conv5 is a a = 10 10 in this case. Then we still use win = da=ne and str = ba=nc to implement each pyramid pooling level. The output of the spatial pyramid pooling layer of this 180-network has the same fixed length as the 224-network"