- Rol(관심 영역) = Selective serch
- object or No object = objectness score
- Bounding box prediction (Bbox regression)
- Class prediction of each box (각 박스 별로 CLASS 예측)
- Non-maximum suppression(NMS) (가장 확률이 높은 박스를 추출)
box와 얼마나 겹치는지, IoU 리스트와 IoU와 같은지 확인(predicted boxes와 ground truth boxes와 비교)
Ground Truth와 Object hypotheses를 비교하여 training (positive examples <-> Difficult negativies)
객체의 bounding Box를 앞에 설명된 네트워크 filter들과의 유사도를 이용하여 거리를 줄여주는 방향으로 학습한다.
#load the image and preprocess it
image = load_img(imagePath, target_size(224,224))
image = imag_to_array(image)
data.append(image)
targets.append((startX, startY, endX, endY))
filenames.append(filename)
...
split = train_test_split(data, targets, filenames, test_size = 0.10, random_state = 42)
(trainImages, testImages) = splist[:2]
(trainTargets, testTargets) = splist[2:4]
(trainFilenames, testFilenames) = splist[4:]
# flatten the max-pooling output of VGG
flatten = vgg.output
flatten = Flatten()(flatten)
#onstruct a fully-connected layer header to output the predicted
#bounding box coordinates
bboxHead = Dense(128, activation = "relu")(flatten)
bboxHead = Dense(64, activation = "relu")(bboxHead)
bboxHead = Dense(32, activation = "relu")(bboxHead)
# 4 : x1, x2, y1, y2
bboxHead = Dense(4, activation = "sigmoid")(bboxHead)
# construct the model we will fine-tune for bounding box regression
model = Model(inputs = vgg.input, ouputs = bboxHead)
opt = Adam(lr = config.INIT_LR)
model.compile(loss = "mse", optimizer = opt)
print(model.summary())
# train the network for bounding box regression 회귀식으로 계산한다.
H = model.fit(
trainImages, trainTargets,
validation_data = (testiMages, testTargets),
batch_size = config.BATCH_SIZE,
epochs = config.NUM_EPOCHS,
verbose = 1)
model = load_model(config.MODEL_PATH)
preds = model.predic(image)[0](startX, startY, endX, endY) = preds
Anchor
컨볼루션 연산 중 anchor 주변에(sliding window) anchor box를 만들어 regression과 calssification을 수행하며 전체 클래스 결과가 아니라 물체의 여부만 출력한다.
1️⃣ Conv 3x3, 512 channel
2️⃣ Conv 1x1, for background decision (2k scores_cls layer)
3️⃣ Conv 1x1, for Bbox regression (4k coordinates_reg layer)
Lcls = objectness (2 classes)
p = predicted object probability
p* = ground-truth ( 1 <- ioU > 0.7, -1 <- ioU < 0.3)
class와 regression의 balancing을 위해 λ를 곱해줌(경험치)
R-CNN | Fast R-CNN | Faster R-CNN | |
---|---|---|---|
이미지당 시간 | 50s | 2s | 0.2s |
속도 | 1x | 25x | 250x |
*그럼에도 불구하고 yolo보다 속도가 낮음
N = matched boxes number
l = predicted box,
g = ground truth box
이렇듯 같이 loss함수를 정의해서 weight vector를 학습시킴.
achor box 별로 이 형식이 적용 되는데 7x7 형식으로 loss함수로 결합이 되어 계산이 됨.
A single neural network predicts bounding boxes and class probabilities directly from
full images in one evaluation. Since the whole detection
pipeline is a single network(기존: roi층을 별도로 두었기에 속도가 느렸음), it can be optimized end-to-end
directly on detection performance.
Our base
YOLO model processes images in real-time at 45 frames
per second.(SSD)
CNNBlock(3, 64, kernel_size = 7, strides = 2, padding = 3)
MaxPool2d(kernel_size = (2,2), stride = (2,2))
해당 코드를 모듈화 시켜서 다음과 같이 표현
class CNNBlock(*nn.Module)
def __init__(self, in_channels, out_channels, **kwargs):
super(CNNBlock, self)__init__()
self.conv = nn.Conv2d(in_channels,out_channels, bias = False, **kwargs)
self.batchnorm = nn.BatchNorm2d(out_channels)
self.leakyrelu = nn.leakyReLU(0.1)
def forward(self, x):
return self.leakyrelu(self.batchnorm(self.conv(x))