[Coding] COCO Dataset 읽기

Dokyeong Kwon·2020년 10월 4일

Pytorch

목록 보기

1/2

🙋‍♀️오늘은 Object Detection, Segmentation, Keypoint Detection 등을 위한 데이터셋인 COCO Dataset 을 어떻게 사용해야 하는지 Pytorch를 이용해서 공부한 내용을 정리해보고자 합니다.

1. Download COCO

https://cocodataset.org/#home 👈 이 사이트에서 데이터셋 다운로드가 가능합니다.

저는 2017 Train images와 2017 Train/Val annotations를 다운받아서 진행해보겠습니다.

다운 받은 zip 파일들을 풀어보면, train2014 val2014 폴더에는 jpg사진들이 들어있을것이고, annotation 폴더에는 json파일들이 있는 것을 확인하실 수 있습니다.

Annotation에는 용도 별로 captions instances person_keypoints 파일이 있는 것을 확인할 수 있습니다.

captions : 그림에 대한 설명 (text)
instances : 그림에 있는 사람/사물 category와 영역 mask
person_keypoints : 사람의 자세 데이터

👉 저는 instances 정보만 필요하므로, 이 정보만 이용 할 예정입니다.

Data들을 살짝 봐보겠습니다. object detection을 하기 위해서는 image와 annotation이 필요하죠! image는 여러 종류의 사진들이 있는 것을 확인할 수 있습니다.

instances_val2017.json 파일을 한번 열어보겠습니다. 👀 메모장으로 열면 깜짝놀랄만큼 알아보기 힘든 내용들이 작성되어 있죠! json beautifier 를 이용해서 보시면 한층 더 깔끔하게 json 파일을 읽어볼 수 있습니다.

운영체제에 맞는 파일을 다운받아 설치해서 사용해보세요! 👉 https://stedolan.github.io/jq/

json파일이 있는 경로로 가서 jq . instances_val2017.json > instances_val2017_jq.json

명령어를 입력하시면 깔끔해진 json 파일이 생성됩니다!

"info": {
    "description": "COCO 2017 Dataset",
    "url": "http://cocodataset.org",
    "version": "1.0",
    "year": 2017,
    "contributor": "COCO Consortium",
    "date_created": "2017/09/01"
  }
  ...
"area": 1037.7818999999995,
      "iscrowd": 0,
      "image_id": 397133,
      "bbox": [
        0,
        262.81,
        62.16,
        36.77
      ],
      "category_id": 1,
      "id": 1218137
    }

"segmentation": [
        [
          292.37,
          425.1,
          340.6,
          373.86,
          347.63,
          256.31,
          198.93,
          240.24,
          4.02,
          311.57,
          1,
          427,
          291.36,
          427
        ]
      ]

이렇게 image에 대한 bbox와 segmentation 정보가 나와있는 것을 볼 수 있습니다.
이제 data loader를 준비해보겠습니다. 아래의 github를 참조하여 작성하였습니다 😊
https://github.com/csm-kr/yolo_v2_vgg16_pytorch/blob/master/dataset/coco_dataset.py

2. Data 준비

다운 받은 데이터를 다음과 같은 경로로 저장해둡니다!

root -- images      -- train2017
     |              |- val2017
     |              |- test2017
     | 
     -- anotations  -- instances_train2017.json
                    |- instances_val2017.json  * minival

3. Project 생성

먼저, 파이참을 이용해 Project를 만들어 줍니다.

data loader를 만들려면 다음과 같은 library들이 import 되어있어야 합니다! 조만간 가상환경 만드는 법도 올려보겠습니다!

Pytorch
PIL
numpy
pycocotools
matplotlib

👉 혹시, windows에서 pycocotools을 설치하시다가 error가 발생하시는 분들은 Anaconda Prompt 에서 아래와 같은 절차대로 설치해보세요!

1. pip install cython

2. conda install git

3. pip install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"

4. Data Loader

import os
import torch
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from pycocotools.coco import COCO
from transform import transform_COCO
from PIL import Image
import matplotlib
from matplotlib.patches import Rectangle

위와 같은 파일들을 import해주시고, data loader class를 만들어줍니다.

경로 오류가 나지 않게 init의 argument에 root_dir를 잘 작성해주세요! validation file들로 살펴보도록 하겠습니다!

class COCO_Dataset(Dataset):
	def __init__(self, root_dir='D:\Data\coco', set_name='val2017', split='TRAIN'):
	  
	        super().__init__()
	        self.root_dir = root_dir
	        self.set_name = set_name
	        self.coco = COCO(os.path.join(self.root_dir, 'annotations', 'instances_' + self.set_name + '.json'))

그리고, 라벨이 없는 이미지들은 제거하고 학습 혹은 테스트에 이용하도록 list에 담아주면 됩니다!

 whole_image_ids = self.coco.getImgIds()  # original length of train2017 is 118287
	
	        self.image_ids = []
	
	        # to remove not annotated image idx
	        self.no_anno_list = []
	
	        for idx in whole_image_ids:
	            annotations_ids = self.coco.getAnnIds(imgIds=idx, iscrowd=False)
	            if len(annotations_ids) == 0:
	                self.no_anno_list.append(idx)
	            else:
	                self.image_ids.append(idx)
	
	        self.load_classes() # read class information
	        self.split = split

COCO는 person, bicycle, car 등등 80개의 classes가 있습니다.

이제 하나의 이미지를 열어 bbox를 표시해보도록 합시다! DataLoader의 getitem() function에서 보통 하나의 image를 열고, bounding box를 읽습니다. 한번 이미지와 그 이미지에 있는 bounding box 정보와 segmentation 정보를 이용해봅시다!

def __getitem__(self, idx):

        visualize = True

        image, (w, h) = self.load_image(idx)

        annotation = self.load_annotations(idx)

        boxes = torch.FloatTensor(annotation[:, :4])
        labels = torch.LongTensor(annotation[:, 4])

        if labels.nelement() == 0:  # no labeled img exists.
            visualize = True
        # data augmentation
        image, boxes, labels, segmentations = transform_COCO(image, boxes, labels, self.split)

        return image, boxes, labels

🙋‍♀️ 이미지를 읽어봅시다! coco 라이브러리를 이용해서 이미지 정보를 읽어오고, PIL로 열 수 있습니다.

def load_image(self, image_index):
        image_info = self.coco.loadImgs(self.image_ids[image_index])[0]
        path = os.path.join(self.root_dir, 'images', self.set_name, image_info['file_name'])
        image = Image.open(path).convert('RGB')
        return image, (image_info['width'], image_info['height'])

🙋‍♀️ Annotation도 읽어봅시다!

def load_annotations(self, image_index):
        # get ground truth annotations
        annotations_ids = self.coco.getAnnIds(imgIds=self.image_ids[image_index], iscrowd=False)
        annotations = np.zeros((0, 5))

        # some images appear to miss annotations (like image with id 257034)
        if len(annotations_ids) == 0:
            return annotations

        # parse annotations
        coco_annotations = self.coco.loadAnns(annotations_ids)
        for idx, a in enumerate(coco_annotations):

            # some annotations have basically no width / height, skip them
            if a['bbox'][2] < 1 or a['bbox'][3] < 1:
                continue

            annotation = np.zeros((1, 5))
            annotation[0, :4] = a['bbox']
            annotation[0, 4] = self.coco_label_to_label(a['category_id'])
            annotations = np.append(annotations, annotation, axis=0)

        # transform from [x, y, w, h] to [x1, y1, x2, y2]
        annotations[:, 2] = annotations[:, 0] + annotations[:, 2]
        annotations[:, 3] = annotations[:, 1] + annotations[:, 3]

        return annotations

그렇다면, 학습에 이용하기 위해서는 어떻게 해야 할지 알아보겠습니다. train함수에서 train loader를 불러와서 train data들을 이용해야 하겠죠? 아래의 코드가 바로 Loader를 불러오는 부분입니다.

train_set = COCO_Dataset()
train_loader = DataLoader(train_set,
                              batch_size=1,
                              collate_fn=train_set.collate_fn,
                              shuffle=False,
                              num_workers=0,
                              pin_memory=True)

이렇게 data들을 불러오면 train_loader라는 곳에 저장이 되고,

for i, (images, boxes, labels) in enumerate(train_loader):
        images = images.cuda()
        boxes = [b.cuda() for b in boxes]
        labels = [l.cuda() for l in labels]