YOLO - object detection model

Jiyoon·2022년 2월 28일

Machine Learning

목록 보기

1/2

너무 운이 좋게 인생 처음으로 교내 외주를 맡게 되었다.
아직 확실하게 어떤 일을 해야한다 정해진 것은 없고, 큰 범주(머신러닝 - 이미지 분류 분야)내에서 교수님이 그때그때에 필요하신 프로그램을 짜드리면 된다.
원래부터도 머신러닝을 꼭 한번 공부해보고 싶었는데 이런식으로 커리어를 쌓으며 공부할 수 있는 기회가 찾아와서 너무 기쁘다ㅎㅎ
일단 사수분(원래 일을 맡으시던 분)이 yolo-object detection에 대해 알아보라고 하셔서 외국 블로그와 영상을 보며 정리했다

YOLO(You Only Look Once) - 딥러닝 알고리즘

이전의 딥러닝 알고리즘과는 비교도 안되게 빨라서 굉장히 유명함

이 알고리즘을 실행시키기 위해 세 가지 딥러닝 프레임워크가 필요함

Darknet: yolo에서 만든 프레임워크

굉장히 빠르고 CPU, GPU와 같이 돌아갈 수 있다.
리눅스에서만 돌아간다

Darkflow: Darknet이 Tensorflow에 적응한 버전.

빠르고 CPU, GPU 사용 가능, 운영체제 상관없음
설치하기 굉장히 까다로움

Opencv:

opencv외에 설치할 것들이 없음
CPU 하고만 돌아가서 비디오를 실시간으로 프로세싱하긴 어렵다

알고리즘을 사용하기 위해서는 세 가지 파일이 필요함

Weight file: 훈련된 모델이다, 물체를 발견해내기 위한 알고리즘의 코어.

Cfg file: 알고리즘의 환경설정을 할 수 있는 파일

Name files: 알고리즘이 찾을 수 있는 이름들이 들어있는 파일

코드 설명

yolo 로드하기

# Load Yolo
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))

물체를 찾아내고 싶어하는 이미지 로드, 이미지의 높이와 너비 받기

# Loading image
img = cv2.imread("room_ser.jpg")
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape

원래 크기의 이미지를 네트워크에 쓸 수 없기 때문에 이 파일을 blob으로 변환해줘야 한다.

Blob

이미지에서 어떠한 특성을 이끌어내고 사이즈를 재조정 할 때 쓰인다.

yolo에서 쓰이는 3가지 크기

- 320 x 320: 작고 정확도 낮음, 속도 빠름

- 609 x 609: 크고 정확도 높음, 속도 느림

- 416 x 416: 중간 크기라 보통의 정확도&속도

오브젝트 찾기

# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)

여기서 outs가 발견된 객체들의 모든 정보가 담겨 있는 array다.

발견된 객체들의 위치(Bounding box의 좌표값) 및 객체의 범주[class](Confidence score)가 담겨 있다.

Bounding box

모델은 사각형의 좌상단 좌표(x1, y1)과 우하단 좌표(x2, y2)좌표를 반환한다.
사각형의 좌표 형식은 모델마다 상이.

Confidence score

예측한 box 내 존재하는 객체의 범주를 예측한 확률의 최대값.
ex) 객체를 책상이라고 예측한 후 confidence score가 0.8 → 객체가 책상일 확률 80%.

이 부분에서 detection은 끝난 것이며 화면에 result만 띄워주면 된다.

결과값 저장하기

# Showing informations on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
				#굳이 왜 이렇게 하는지,,,? confidence = scores.max()하면 안되나
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            # Object detected
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            # Rectangle coordinates
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

outs array를 돌며 confidence값을 계산하고 confidence threshold를 정해준다.

Confidence threshold

허용할 수 있는 최소한의 확률(confidence score)

ex) confidence threshold가 0.5 → 만약 confidence score이 0.5보다 크다면 객체가 정확하게 발견됐음을 알 수 있다. 더 작다면 버린다.

np.argmax, np.argmin, np.where

최소, 최대, 조건 색인값
https://rfriend.tistory.com/356

노이즈 지우기

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

detection을 실행할 때, 같은 객체에 더 많은 박스가 생성된다 → 이러한 noise를 없애주기 위해 Non maximum suppression이라는 함수를 사용한다.

Non maximum suppression

우리가 임의로 정한 confidence threshold보다 높은 score을 가진 박스와 겹치는 박스들을 없애 준다.

찾은 객체 정보 화면에 띄우기

font = cv2.FONT_HERSHEY_PLAIN
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = str(classes[class_ids[i]])
        color = colors[i]
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(img, label, (x, y + 30), font, 3, color, 3)
cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

저장한 box의 좌표값, confidence를 토대로 화면에 bounding box와 객체의 이름을 띄워준다.

전체코드

import cv2
import numpy as np

# Load Yolo
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))

# Loading image
img = cv2.imread("IMG_7508.jpg")
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape

# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)

net.setInput(blob)
outs = net.forward(output_layers)

# Showing informations on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            # Object detected
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)

            # Rectangle coordinates
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)

            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
print(indexes)
font = cv2.FONT_HERSHEY_PLAIN
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = str(classes[class_ids[i]])
        color = colors[class_ids[i]]
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(img, label, (x, y + 30), font, 3, color, 3)

cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

커스터 객체를 찾아내는 방법

custom 객체를 찾아내기 위해서는 이미 훈련된 모델이 아닌 YOLO model을 직접 만들어야 한다.

찾아내고 싶어하는 객체가 포함된 이미지들을 포함한 datase을 만든다
해당 이미지로 YOLO model을 훈련시킨다

결국엔 내가 직접 특정 상황에서 활용될 수 있는 프로그램을 만들려면 직접 데이터 셋을 가지고 모델을 만들어야 한다,,,
일단 YOLO라는 알고리즘과 object detection에 대해 얄팍하게라도 이해를 했으니 다음에는 직접 커스텀 모델로 쉬운 프로그램이라도 짜봐야 할듯😀

Jiyoon

다음 포스트