### Annotation
- 객체 탐지 모델을 학습하기 위한 객체의 정보를 담고 있는 파일 입니다
- A text file who contains informations about the name and coordinations of objects
### Pascal VOC
- xmin
- 대상 객체의 left 의 x 좌표
- The x-coordinate of the object's left
- ymin
- 대상 객체의 top 의 y 좌표
- The y-coordinate of the object's top
- xmax
- 대상 객체의 right 의 x 좌표
- The x-coordination of the object's right
- ymax
- 대상 객체의 bottom 의 y 좌표
- The y-coordination of the object's bottom
- name
- 대상 객체의 이름
- The name of the object
### YOLO
- class index
- 대상 객체의 이름이 class 정의 파일 내에 해당하는 인덱스 번호
- The number of index defined in definition file
- xcenter: (xmin + (xmax - xmin)/2) / image_width
- ycenter: (ymin + (ymax - ymin)/2) / image_height
- width: (xmax - xmin) / image_width
- height: (ymax - ymin) / image_height

import cv2 as cv
from xml.dom import minidom
import xml.etree.ElementTree as ET
filepath = 'annotation/aeroplane_01.jpg'
h, w, c = cv.imread(filepath).shape
annotation = ET.Element('annotation')
filename = ET.SubElement(annotation, 'filename')
filename.text = 'aeroplane_01.jpg'
size = ET.SubElement(annotation, 'size')
width = ET.SubElement(size, 'width')
height = ET.SubElement(size, 'height')
depth = ET.SubElement(size, 'depth')
width.text = str(w)
height.text = str(h)
depth.text = str(c)
obj = ET.SubElement(annotation, 'object')
name = ET.SubElement(obj, 'name')
bndbox = ET.SubElement(obj, 'bndbox')
xmin = ET.SubElement(bndbox, 'xmin')
ymin = ET.SubElement(bndbox, 'ymin')
xmax = ET.SubElement(bndbox, 'xmax')
ymax = ET.SubElement(bndbox, 'ymax')
name.text = 'engine'
xmin.text = '213'
ymin.text = '180'
xmax.text = '265'
ymax.text = '205'
with open('annotation/aeroplane_01.xml', 'w') as f:
f.write(minidom.parseString(ET.tostring(annotation)).toprettyxml(indent = " "))
<?xml version="1.0" ?>
<annotation>
<filename>aeroplane_01.jpg</filename>
<size>
<width>501</width>
<height>333</height>
<depth>3</depth>
</size>
<object>
<name>engine</name>
<bndbox>
<xmin>213</xmin>
<ymin>180</ymin>
<xmax>265</xmax>
<ymax>205</ymax>
</bndbox>
</object>
</annotation>
pascal voc 형태의 annotation
픽셀을 기반으로 찾아내기 때문에 이미지 리사이즈시 annotation이 깨짐
import xml.etree.ElementTree as ET
tree = ET.parse('annotation/aeroplane_01.xml')
root = tree.getroot()
root.tag
size = root.find('size')
width = int(size.find('width').text)
height = int(size.find('height').text)
(width, height)
objects = root.findall('object')
for obj in objects:
name = obj.find('name').text
bndbox = obj.find('bndbox')
xmin = bndbox.find('xmin').text
ymin = bndbox.find('ymin').text
xmax = bndbox.find('xmax').text
ymax = bndbox.find('ymax').text
display([name, xmin, ymin, xmax, ymax])
classes = {}
with open ('annotation/classes.txt', 'r') as f:
lines = f.readlines()
index = 0
for line in lines:
name = line.strip()
classes[name] = index
index += 1
display(classes)
========================================================================================
{'aeroplane': 0,
'people': 1,
'female': 2,
'male': 3,
'car': 4,
'robot': 5,
'engine': 6}
import xml.etree.ElementTree as ET
f = open('annotation/aeroplain_01.my.txt', 'w')
tree = ET.parse('annotation/aeroplane_01.xml')
root = tree.getroot()
size = root.find('size')
image_width = int(size.find('width').text)
image_height = int(size.find('height').text)
objects = root.findall('object')
for obj in objects:
name = obj.find('name').text
bndbox = obj.find('bndbox')
xmin = int(bndbox.find('xmin').text)
ymin = int(bndbox.find('ymin').text)
xmax = int(bndbox.find('xmax').text)
ymax = int(bndbox.find('ymax').text)
class_index = classes[name]
xcenter = round((xmin + (xmax - xmin)/2) / image_width, 6)
ycenter = round((ymin + (ymax - ymin)/2) / image_height, 6)
width = round((xmax - xmin) / image_width , 6)
height = round((ymax - ymin) / image_height, 6)
annotation = f'{class_index} {xcenter} {ycenter} {width} {height}\n'
f.write(annotation)
f.close()
with open('annotation/aeroplain_01.my.txt', 'r', encoding='utf-8') as file:
content = file.read()
print(content)
YOLO는 annotation을 정규화하여 저장하기 때문에 이미지 학습시에 YOLO가 유리하다.
데이터 처리에는 Pascal VOC
인공지능 학습에는 YOLO