[PyTorch] Image and Video

YSLยท2023๋…„ 7์›” 3์ผ
0

PyTorch

๋ชฉ๋ก ๋ณด๊ธฐ
1/5
post-thumbnail

๐Ÿ“ PyTorch Tutorials


TORCHVISION OBJECT DETECTION FINETUNING TUTORIAL

Task : Penn-Fudan Database for Pedestrian Detection and Segmentation ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ Mask R-CNN ๋ชจ๋ธ์„ Fine-Tuning ํ•ด๋ณด๊ธฐ
โ‡’ ์‚ฌ๋žŒ ์—ฌ๋ถ€๋ฅผ ํŒŒ์•…ํ•˜๋Š” Instance Segmentation ๋ชจ๋ธ ํ•™์Šต

Mask R-CNN
Faster R-CNN์˜ RPN์—์„œ ์–ป์€ RoI(Region of Interest)์— ๋Œ€ํ•˜์—ฌ ๊ฐ์ฒด์˜ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•˜๋Š” classification branch + bbox regression์„ ์ˆ˜ํ–‰ํ•˜๋Š” bbox regression branch + ํ‰ํ–‰์œผ๋กœ segmentation mask๋ฅผ ์˜ˆ์ธกํ•˜๋Š” mask branch
โ‡’ ๋ฌผ์ฒด๊ฐ€ ์žˆ์„์ง€๋„ ๋ชจ๋ฅด๋Š” ์œ„์น˜์˜ ํ›„๋ณด ์˜์—ญ์„ ์ œ์•ˆํ•˜๋Š” ๋ถ€๋ถ„(RoI) : selective search, RPN(Region Proposal Network) ๋“ฑ์„ ์ด์šฉํ•ด ํŠน์ง•์„ ์ถ”์ถœํ•จ โ†’ ์ฃผ์–ด์ง„ RoI๋“ค์— ๋Œ€ํ•ด ํด๋ž˜์Šค๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ณ  bbox๋ฅผ ํšŒ๊ท€ํ•จ

  • mask branch : ๊ฐ๊ฐ์˜ RoI์— ์ž‘์€ ํฌ๊ธฐ์˜ FC Network๊ฐ€ ์ถ”๊ฐ€๋œ ํ˜•ํƒœ
  • mask : ํด๋ž˜์Šค์— ๋”ฐ๋ผ ๋ถ„ํ• ๋œ ์ด๋ฏธ์ง€ ์กฐ๊ฐ

Data

  • Instance Segmentation์„ ์œ„ํ•ด์„œ๋Š” target์— "boxes", "labels", "masks"๊ฐ€ ๊ผญ ํ•„์š”ํ•จ
import numpy as np
import torch
import torch.utils.data
from PIL import Image


class PennFudanDataset(torch.utils.data.Dataset):
    def __init__(self, root, transforms=None):
        self.root = root
        self.transforms = transforms
        # load all image files, sorting them to
        # ensure that they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
        self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))

    def __getitem__(self, idx):
        # load images ad masks
        img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
        mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        # note that we haven't converted the mask to RGB,
        # because each color corresponds to a different instance
        # with 0 being background
        mask = Image.open(mask_path)

        mask = np.array(mask)
        obj_ids = np.unique(mask) # ๊ฐ๊ฐ ๋‹ค๋ฅธ ์ƒ‰์œผ๋กœ ์ธ์ฝ”๋”ฉ๋œ ์ธ์Šคํ„ด์Šค๋“ค
        # first id is the background, so remove it
        obj_ids = obj_ids[1:]

        # split the color-encoded mask into a set
        # of binary masks
        masks = mask == obj_ids[:, None, None]

        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # only one class(=label) : ์‚ฌ๋žŒ์„ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ๋ผ์„œ
        labels = torch.ones((num_objs,), dtype=torch.int64) # shape : (num_objs, )
        masks = torch.as_tensor(masks, dtype=torch.uint8)

        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd # ๋ฌผ์ฒด๊ฐ€ ๋„ˆ๋ฌด ์ž‘์€๋ฐ ๋งŽ์•„์„œ ํ•˜๋‚˜์˜ ๊ตฐ์ง‘์œผ๋กœ ๋ฐ•์Šค๋ฅผ ์ฒ˜๋ฆฌํ•˜์—ฌ ๋ ˆ์ด๋ธ”๋งํ–ˆ๋Š”์ง€์— ๊ด€ํ•œ ์—ฌ๋ถ€

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.imgs)
  • __init__ ๋ฉ”์†Œ๋“œ : ์ด๋ฏธ์ง€๋“ค์˜ ๊ฒฝ๋กœ๋ฅผ ๋ฐ›์•„์™€ ์ •๋ ฌํ•œ ํ›„ imgs, masks๋กœ ์‚ฌ์šฉ
  • __getitem() ๋ฉ”์†Œ๋“œ : ๊ฒฝ๋กœ๋“ค์„ idx๋กœ ์ ‘๊ทผํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ open()ํ•˜๊ณ  mask๊ฐ€ ์•„๋‹Œ img์— ๋Œ€ํ•ด์„œ๋งŒ RGB๋กœ ๋ณ€ํ™˜ (mask์˜ ๊ฐ ์ƒ‰๊น”์€ ๋‹ค๋ฅธ ์ธ์Šคํ„ด์Šค๋ฅผ ์˜๋ฏธํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž„์˜๋กœ RGB๋กœ ๋ณ€ํ™˜ํ•˜๋ฉด ์•ˆ๋จ) & target ๋”•์…”๋„ˆ๋ฆฌ์— bbox, labels, masks, img_id, area, iscrowd ์ •๋ณด ๋‹ด์•„ ๋ฐ˜ํ™˜

Model

  • maskrcnn_resnet50_fpn : backbone = resnet50 & head = fpn

    FPN (Feature Pyramid Network)
    Top-down ๋ฐฉ์‹์œผ๋กœ ํŠน์ง•์„ ์ถ”์ถœํ•˜๋ฉฐ, ๊ฐ ์ถ”์ถœ๋œ ๊ฒฐ๊ณผ๋“ค์ธ low-resolution ๋ฐ high-resolution๋“ค์„ ๋ฌถ๋Š” ๋ฐฉ์‹
    ๊ฐ ๋ ˆ๋ฒจ์—์„œ ๋…๋ฆฝ์ ์œผ๋กœ ํŠน์ง•์„ ์ถ”์ถœํ•˜์—ฌ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•˜๋Š”๋ฐ ์ƒ์œ„ ๋ ˆ๋ฒจ์˜ ์ด๋ฏธ ๊ณ„์‚ฐ๋œ ํŠน์ง•์„ ์žฌ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ๋ฉ€ํ‹ฐ ์Šค์ผ€์ผ ํŠน์ง•๋“ค์„ ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ

    • forward์—์„œ ์ถ”์ถœ๋œ ์˜๋ฏธ ์ •๋ณด๋“ค์„ top-down ๊ณผ์ •์—์„œ upsamplingํ•˜์—ฌ ํ•ด์ƒ๋„๋ฅผ ์˜ฌ๋ฆผ
    • forward์—์„œ ์†์‹ค๋œ ์ •๋ณด๋“ค์„ skip connection์œผ๋กœ ๋ณด์ถฉํ•จ

    1. Bottom-up pathway
      : Backbone ConvNet์˜ Feedforward ๊ณ„์‚ฐ โ†’ ๋งค ์ธต๋งˆ๋‹ค ์˜๋ฏธ ์ •๋ณด๋ฅผ ์‘์ถ•ํ•˜๋Š” ์—ญํ• 
      ๊ฐ ๋‹จ๊ณ„(๋ ˆ์ด์–ด๋“ค)์˜ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์˜ ์ถœ๋ ฅ = feature map์˜ Reference Set

    2. Top-down pathway and lateral connection
      : feature map์„ upsamplingํ•˜์—ฌ ๋” ๋†’์€ ํ•ด์ƒ๋„์˜ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“œ๋Š” ์—ญํ• 
      skip-connection์„ ํ†ตํ•ด ๊ฐ™์€ ์‚ฌ์ด์ฆˆ์˜ bottom-up ๋ ˆ์ด์–ด์™€ ํ•ฉ์ณ์„œ ์†์‹ค๋œ ์ง€์—ญ์  ์ •๋ณด๋ฅผ ๋ณด์ถฉํ•จ

FINETUNING ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•

  • ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ fine-tuning
  • ๋‹ค๋ฅธ backbone์„ ์ถ”๊ฐ€ํ•˜๋„๋ก ๋ชจ๋ธ ์ˆ˜์ •ํ•˜๊ธฐ
    - backbone
    : ๊ฐœ์ฒด๋ฅผ ๊ฒ€์ถœํ•˜๋“  ์˜์—ญ๋“ค์„ ๋‚˜๋ˆ„๋“  ์—ฌ๋Ÿฌ๊ฐ€์ง€ task๊ฐ€ ๋ชธ์˜ ๊ฐ ๋ถ€๋ถ„์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ResNet๊ณผ ๊ฐ™์€ classification ๋ชจ๋ธ์ด ์ž…๋ ฅ์„ ๋ฐ›์•„์„œ ๋‹ค์–‘ํ•œ feature๋ฅผ ์ถ”์ถœํ•ด ๊ฐ task์— ๋งž๋Š” ๋ชจ๋“ˆ๋กœ ์ „๋‹ฌํ•ด์ฃผ๋Š” ์—ญํ• ์„ ํ•จ

Train

  • lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=3,gamma=0.1)
    : 3 ์—ํญ๋งˆ๋‹ค ํ•™์Šต๋ฅ ์ด 10๋ฐฐ ์ค„์–ด๋“ค๋„๋ก ํ•™์Šต๋ฅ  ์Šค์ผ€์ฅด๋Ÿฌ ์„ค์ •

์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๊ฐ™์€ learning rate๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์ฒ˜์Œ์—๋Š” ํฐ ๋ณดํญ์œผ๋กœ ๋น ๋ฅด๊ฒŒ ์ตœ์ ํ™”ํ•˜๊ณ  ์ตœ์ ๊ฐ’์— ๊ฐ€๊นŒ์›Œ์งˆใ„น์ˆ˜๋ก ๋ณดํญ์„ ์ค„์—ฌ ๋ฏธ์„ธ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์ด ํ•™์Šต์ด ๋” ์ž˜๋œ๋‹ค๊ณ  ์•Œ๋ ค์ ธ ์žˆ์Œ

<Learnging rate scheduler>
step 1. optimizer์™€ scheduler ์ •์˜ํ•˜๊ธฐ
step 2. ํ•™์Šต ์‹œ batch๋งˆ๋‹ค optimizer.step(), epoch๋งˆ๋‹ค scheduler.step()

for epoch in range(epochs):
    for i, (data) in enumerate(data_loader):
        x_data, y_data = data
        optimizer.zero_grad()    
        estimated_y = model(x_data)
        loss = loss(y_data, estimated_y)
        loss.backward()
        optimizer.step()
    scheduler.step()

step size๋งˆ๋‹ค gamma ๋น„์œจ๋กœ lr์„ ๊ฐ์†Œ์‹œํ‚ค๋Š” StepLR() ์™ธ์—๋„
- LambdaLR() : ์ดˆ๊ธฐ lr์— lambdaํ•จ์ˆ˜์—์„œ ๋‚˜์˜จ ๊ฐ’์„ ๊ณฑํ•ด์„œ lr์„ ์กฐ์ ˆํ•จ
- MultiplicativeLR() : ์ดˆ๊ธฐ lr์— lambdaํ•จ์ˆ˜์—์„œ ๋‚˜์˜จ ๊ฐ’์„ ๋ˆ„์ ๊ณฑํ•ด์„œ lr์„ ์กฐ์ ˆํ•จ
- MultiStepLR() : lr์„ ๊ฐ์†Œ์‹œํ‚ฌ epoch์„ ์ง์ ‘ ์ง€์ •ํ•ด์คŒ
๋“ฑ PyTorch๊ฐ€ ์ œ๊ณตํ•˜๋Š” ๋‹ค์–‘ํ•œ Learning rate scheduler๊ฐ€ ์žˆ์Œ


TRANSFER LEARNING FOR COMPUTER VISION TUTORIAL

Task : ImageNet์ฒ˜๋Ÿผ ๋งค์šฐ ํฐ ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด ์‚ฌ์ „ํ•™์Šต๋œ ํ•ฉ์„ฑ๊ณต ์‹ ๊ฒฝ๋ง์—์„œ ๋งˆ์ง€๋ง‰ FC layer๋งŒ ์ƒˆ๋กœ์šด ๋žœ๋ค ๊ฐ€์ค‘์น˜๋กœ ๋Œ€์ฒด์‹œ์ผœ ์ด ์ธต๋งŒ ํ•™์Šต์‹œํ‚ค๊ธฐ

Data

  • ImageFolder : PyTorch์—์„œ ์ œ๊ณตํ•˜๋Š” ๋ชจ๋“ˆ๋กœ, ๊ณ„์ธต์ ์ธ ํด๋” ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•จ
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}

โ‡’ ๊ฐ ์ด๋ฏธ์ง€๋“ค์ด ์ž์‹ ์˜ ํด๋ž˜์Šค ์ด๋ฆ„์œผ๋กœ ๋œ ํด๋” ์•ˆ์— ๋“ค์–ด๊ฐ€ ์žˆ๋Š” ๊ตฌ์กฐ๋ผ๋ฉด, ImageFolder ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฐ์ฒด๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ

  • next(), iter()
inputs, classes = next(iter(dataloaders['train']) # ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•˜๋‚˜์˜ ๋ฐฐ์น˜ ๋ฐ›์•„์˜ค๊ธฐ

Train

๋ฏธ๋ฆฌ ํ•™์Šตํ•œ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜จ ํ›„ ๋งˆ์ง€๋ง‰ ๊ณ„์ธต์„ ์ œ์™ธํ•œ ๋ชจ๋“  ๋ถ€๋ถ„์„ ๊ณ ์ •์‹œํ‚ค๊ณ  ๋งˆ์ง€๋ง‰ FC layer๋งŒ finetuningํ•จ

model_conv = torchvision.models.resnet18(weights='IMAGENET1K_V1')
for param in model_conv.parameters():
    param.requires_grad = False # ๋ชจ๋ธ์˜ ๋ชจ๋“  layer ๊ณ ์ •์‹œํ‚ค๊ธฐ
    
# ์ƒˆ๋กœ ์ƒ์„ฑ๋œ layer(= nn.Linear())์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๊ธฐ๋ณธ๊ฐ’์ด ``requires_grad = True``์ž„ 
num_ftrs = model_ft.fc.in_features # fc layer์˜ ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜ ์–ป๊ธฐ
model_ft.fc = nn.Linear(num_ftrs, 2) # fc layer์˜ ์ถœ๋ ฅ์„ ์šฐ๋ฆฌ์˜ task(๊ฐœ๋ฏธ/๋ฒŒ ๋ถ„๋ฅ˜)์— ๋งž๊ฒŒ ์ˆ˜์ •ํ•˜๊ธฐ

model_ft = model_ft.to(device)

ADVERSARIAL EXAMPLE GENERATION

  • Adversarial Attack(์ ๋Œ€์  ๊ณต๊ฒฉ)
    : ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๋‚ด๋ถ€์  ์ทจ์•ฝ์ ์„ ์ด์šฉํ•˜์—ฌ ๋งŒ๋“  ํŠน์ • ๋…ธ์ด์ฆˆ(= perturbation)๊ฐ’์„ ์ด์šฉํ•ด ์˜๋„์ ์œผ๋กœ ์˜ค๋ถ„๋ฅ˜๋ฅผ ์ด๋Œ์–ด์žฌ๋Š” ์ž…๋ ฅ๊ฐ’์„ ๋งŒ๋“ค์–ด ๋‚ด๋Š” ๊ฒƒ
    โ‡’ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์—ฌ๋Ÿฌ ML ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ์˜๋„์ ์œผ๋กœ ๋–จ์–ด๋œจ๋ ค ๋ณด์•ˆ ๋ฌธ์ œ๋ฅผ ์ผ์œผํ‚ด

    - Adversarial Example(์ ๋Œ€์  ์˜ˆ์ œ)
    : ML ๋ชจ๋ธ์˜ ์ฐฉ์‹œ๋ฅผ ์œ ๋„ํ•˜๋Š” ์ž…๋ ฅ
    โ‡’ ๋ชฉํ‘œ : Decision boundary๋ฅผ ๋„˜์„ ์ˆ˜ ์žˆ๋Š” ์ตœ์†Œํ•œ์˜ ๋…ธ์ด์ฆˆ ๋ถ„ํฌ ์ฐพ๊ธฐ
    โ‡’ ์ด๋ฏธ์ง€ ์ƒ ๊ฐ€์žฅ ์˜ค๋ฅธ์ชฝ = Adversarial Example
    +) ๋…ธ์ด์ฆˆ๊ฐ€ ํฌํ•จ๋œ ์‚ฌ์ง„๋„ ์‚ฌ๋žŒ์ด ๋ณด๊ธฐ์—๋Š” ์›๋ž˜ ์‚ฌ์ง„๊ณผ ๊ตฌ๋ถ„๋˜์ง€ ์•Š์•„์•ผ ํ•จ

FGSM & PGD
: ์ ๋Œ€์  ๊ณต๊ฒฉ์˜ ์˜ˆ

  • FGSM(Fast Gradient Sign Method)
wTx~=wTx+wTฮทx~=x+ฮทฮท=ฯตโˆ—sign(w)wฮท=ฯตโˆ—wโˆ—sign(w)=ฯตโˆฃโˆฃwโˆฃโˆฃw^T\tilde{x} = w^Tx + w^T\eta\\ \tilde{x} = x + \eta\\ \eta = \epsilon * sign(w)\\ w\eta = \epsilon * w * sign(w) = \epsilon||w||

์ผ ๋•Œ, ฮท\eta(= perturbation)๊ฐ€ ์ถฉ๋ถ„ํžˆ ์ž‘์„ ๊ฒฝ์šฐ ๋ถ„๋ฅ˜๊ธฐ๋Š” xx์™€ x~\tilde{x}๋ฅผ ๊ฐ™์€ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜ํ•จ
โ†’ wฮทw\eta๋Š” ์ฐจ์› nn์— ๋น„๋ก€ํ•˜๊ฒŒ ์ฆ๊ฐ€ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋†’์€ ์ฐจ์›์˜ ๋ฌธ์ œ์—์„œ input์— ์ž‘์€ noise๊ฐ€ ouput์— ํฐ ์ฐจ์ด๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ

์ผ๋ฐ˜์ ์œผ๋กœ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ฌ ๋•Œ๋Š” ์†์‹ค์ด ๊ฐ€์žฅ ๋‚ฎ์•„์ง€๋Š” ์ง€์ ์„ ์ฐพ์Œ
FGSM : ์ด๋ฏธ ๋ชจ๋ธ์€ ํ•™์Šต์ด ๋๋‚œ ์ƒํƒœ์ด๋ฏ€๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๊ณ ์ •๋˜๊ณ  ๋ฐ์ดํ„ฐ์— ์กฐ์ž‘์„ ๊ฐ€ํ•จ
โ‡’ ์†์‹ค์ด ๊ฐ€์žฅ ๋†’์•„์ง€๋Š” ๋ฐฉํ–ฅ(= ๋ชจ๋ธ์ด ์ตœ๋Œ€ํ•œ ์˜ค๋‹ต์„ ๋‚ด๋„๋ก)์œผ๋กœ ์ด๋ฏธ์ง€๋ฅผ ๋ณ€ํ˜•ํ•จ

def fgsm_attack(image, epsilon, data_grad):
    sign_data_grad = data_grad.sign() # data_grad ์˜ ์š”์†Œ๋ณ„ ๋ถ€ํ˜ธ ๊ฐ’
    perturbed_image = image + epsilon*sign_data_grad # ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๊ฐ ํ”ฝ์…€์— sign_data_grad ๋ฅผ ์ ์šฉํ•ด ์ž‘์€ ๋ณ€ํ™”๊ฐ€ ์ ์šฉ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ
    perturbed_image = torch.clamp(perturbed_image, 0, 1) # clipping๊ฐ’ ๋ฒ”์œ„๋ฅผ [0,1]๋กœ ์œ ์ง€
    return perturbed_image
  • PGD
    : FGSM ๋ฐฉ๋ฒ•์„ ์‘์šฉํ•œ ๊ฒƒ์œผ๋กœ, nn๋ฒˆ์˜ step๋งŒํผ ๊ณต๊ฒฉ์„ ๋ฐ˜๋ณตํ•˜๋Š”๋ฐ, ๊ฐ step๋งˆ๋‹ค ฯต\epsilon์ด ์•„๋‹Œ learning rate๋งŒํผ ๋ฐ์ดํ„ฐ xx์˜ ๋ณ€ํ˜•์ด ์ผ์–ด๋‚˜๋„๋ก ํ•จ

Testing

  1. ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์˜ ์›๋ž˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ ์˜ˆ์ธก
    2-1. ์˜ˆ์ธก์ด ํ‹€๋ฆฌ๋ฉด ๋‹ค์Œ ๋ฐ์ดํ„ฐ๋กœ ๋„˜์–ด๊ฐ
    2-2. ์˜ˆ์ธก์ด ๋งž์œผ๋ฉด ์†์‹ค, ๋ณ€ํ™”๋„ ๊ณ„์‚ฐ
    โ†’ perturbed_data = fgsm_attack(data, epsilon, data_grad) : ๋ณ€ํ™”๋„ ๊ฐ’์„ ์ด์šฉํ•ด ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€๋œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ
    โ†’ output = model(perturbed_data) : ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€๋œ(= ๊ณต๊ฒฉ ๋ฐ›์€) ์ด๋ฏธ์ง€๋กœ ์žฌ๋ถ„๋ฅ˜
    โ†’ ์žฌ๋ถ„๋ฅ˜ ์ •ํ™•๋„ ๊ณ„์‚ฐ

โ‡’ ์ •ํ™•๋„์™€ ฯต\epsilon์˜ ๊ด€๊ณ„ : trade-off
= ฯต\epsilon์ด ์ฆ๊ฐ€ํ•จ(= ๋…ธ์ด์ฆˆ๊ฐ€ ๋” ์ปค์ง)์— ๋”ฐ๋ผ ํ…Œ์ŠคํŠธ ์ •ํ™•๋„๊ฐ€ ๊ฐ์†Œํ•จ


DCGAN TUTORIAL

Generative Adversarial Networks

[์ผ€๋ผ์Šค ์ฐฝ์‹œ์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” ๋”ฅ๋Ÿฌ๋‹] ์ฝ”๋“œ์—๋Š”
real image = 0 / generator๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋œ fake image = 1๋กœ ๋ผ๋ฒจ๋ง๋˜์–ด ์žˆ์–ด
ํ•„๊ธฐ ๋‚ด์šฉ์ด PyTorch ํŠœํ† ๋ฆฌ์–ผ๊ณผ ๋ฐ˜๋Œ€์ž„

Discriminator

โ‡’ "๋ณ€ํ™”๋„(gradient)๋ฅผ ์ƒ์Šน(ascending)์‹œํ‚ค๋ฉฐ ํ›ˆ๋ จโ€
= log(D(x))+log(1โˆ’D(G(z)))log(D(x)) + log(1-D(G(z))) ์ตœ๋Œ€ํ™”์‹œํ‚ค๊ธฐ
= D(x)=1D(x) = 1, D(G(z))=0D(G(z)) = 0์œผ๋กœ ์ž˜ ํŒ๋ณ„ํ•˜๋„๋ก ํ•™์Šต์‹œํ‚ค๊ธฐ

Generator

= log(1โˆ’D(G(z)))log(1 - D(G(z))) ์ตœ์†Œํ™”์‹œํ‚ค๊ธฐ
= D(G(z))=1D(G(z)) = 1์ด ๋‚˜์˜ค๋„๋ก ํ•™์Šต์‹œํ‚ค๊ธฐ (์ƒ์„ฑ์ž๊ฐ€ ๋งŒ๋“  ๊ฐ€์งœ ์ด๋ฏธ์ง€๋ฅผ ํŒ๋ณ„์ž๊ฐ€ ์‹ค์ œ ์ด๋ฏธ์ง€(1)๋ผ๊ณ  ์˜ˆ์ธกํ•˜๋„๋ก)

โ†’ log(D(G(z))log(D(G(z)) ์ตœ๋Œ€ํ™”์‹œํ‚ค๊ธฐ (์œ„ ๋ฐฉ์‹์ด ํ•™์Šต์ด ์ž˜ ์•ˆ๋ผ์„œ)

generator๋กœ ์ƒ์„ฑํ•œ ์ด๋ฏธ์ง€๋ฅผ real image๋ผ๊ณ  ์†์—ฌ์„œ ๋ผ๋ฒจ๋งํ•œ ํ›„ ์ด๋ฏธ ํ•™์Šต ์™„๋ฃŒ๋œ Discriminator์—๊ฒŒ ์ „๋‹ฌ
โ†’ Discriminator๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ fake image๋ผ ํŒ๋ณ„
โ†’ ์ž…๋ ฅ label(= ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ real์ด๋ผ ๊ฐ€์งœ๋กœ ๋ผ๋ฒจ๋ง)๊ณผ ํŒ๋ณ„๊ฐ’ ๊ฐ„ loss๋ฅผ ์ค„์ด๋„๋ก Generator ํ•™์Šต

+) Discriminator๋ฅผ ํ•™์Šต์‹œํ‚ฌ ๋•Œ๋Š” real image์™€ Generator๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋œ fake image๋ฅผ ๋‘˜ ๋‹ค ์‚ฌ์šฉํ•˜์ง€๋งŒ,
Generator๋ฅผ ํ•™์Šต์‹œํ‚ฌ ๋•Œ๋Š” real image๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๊ณ  Discriminaotr๋กœ๋ถ€ํ„ฐ ์–ป์–ด์ง€๋Š” ์ •๋ณด๋ฅผ ํ†ตํ•ด real image๋ฅผ ๋ณด์ง€ ์•Š๊ณ ๋„ ์ตœ๋Œ€ํ•œ ๋น„์Šทํ•˜๊ฒŒ ๊ฐ€์งœ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ํ›ˆ๋ จํ•จ

DCGAN (Deep Convolutional Generatice Adversarial Network)

๊ตฌ๋ถ„์ž์—์„œ๋Š” Conv2d()๋กœ ์ธต์„ ์Œ“๋Š” ๋ฐ˜๋ฉด, ์ƒ์„ฑ์ž์—์„œ๋Š” ConvTranspose2d()๋กœ ์ธต์„ ์Œ“์Œ

  • Convoluation Layer
    : down-sampling์˜ ํšจ๊ณผ๊ฐ€ ์žˆ์Œ (input ์ฐจ์› > output ์ฐจ์›)

  • Transposed Convolutional Layer
    : ์›๋ณธ ์ธต๊ณผ ๊ฐ™์€ ๊ณต๊ฐ„ ์ฐจ์›์œผ๋กœ up-sampling (input ์ฐจ์› < output ์ฐจ์›)

    ์›๋ณธ size = 5 x 6์ด๊ณ  ์ž…๋ ฅ size = 2 x 3์ผ ๋•Œ, kernel size = 4 x 4๋กœ ConvTranspose2d()ํ•˜๊ฒŒ ๋˜๋ฉด 5x6ํฌ๊ธฐ์˜ 2โˆ—32*3๊ฐœ์˜ tensor๊ฐ€ ๊ตฌํ•ด์ง€๊ณ ์ด๋ฅผ ๋”ํ•ด ์ตœ์ข… ์ถœ๋ ฅ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ


SPATIAL TRANSFORMER NETWORKS TUTORIAL

๐Ÿ“ ์ฐธ๊ณ ์ž๋ฃŒ

์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ๋Š” ์ด๋ฏธ์ง€๊ฐ€ ๋ณ€ํ™˜๋˜๋”๋ผ๋„ ๊ทธ ์ด๋ฏธ์ง€๋กœ ์ธ์‹ํ•˜๋Š” ๊ฒƒ(spatial invariance)๊ฐ€ ์ค‘์š”ํ•œ๋ฐ
์ด๋ฅผ ์œ„ํ•ด CNN์—์„œ๋Š” max pooling layer๊ฐ€ ํ•„์š”ํ•œ ๋ฐ˜๋ฉด,
Spatial Transformation์€ affine transformation(์ด๋ฏธ์ง€์˜ ํŠน์ • ๋ถ€๋ถ„์„ ์ž๋ฅด๊ณ  ๋ณ€ํ™˜ํ•ด์„œ ๊ทธ ๋ถ€๋ถ„๋งŒ ๋–ผ์–ด์„œ ํ›ˆ๋ จ์‹œํ‚ด)์„ ์ด์šฉํ•จ

โ‡’ spatial transform ๋ชจ๋“ˆ์„ ํ†ตํ•ด ์ฐŒ๊ทธ๋Ÿฌ์ง์ด๋‚˜ ํšŒ์ „ ๋“ฑ์˜ ๋…ธ์ด์ฆˆ๊ฐ€ ์ฒจ๊ฐ€๋œ ์ด๋ฏธ์ง€๋ฅผ ์ถ”๋ก ํ•˜์—ฌ ์ ์ ˆํ•œ ์•„์›ƒํ’‹์„ ๋„์ถœํ•  ์ˆ˜ ์žˆ์Œ

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

        # Spatial transformer localization-network
        self.localization = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=7),
            nn.MaxPool2d(2, stride=2),
            nn.ReLU(True),
            nn.Conv2d(8, 10, kernel_size=5),
            nn.MaxPool2d(2, stride=2),
            nn.ReLU(True)
        )

        # Regressor for the 3 * 2 affine matrix
        self.fc_loc = nn.Sequential(
            nn.Linear(10 * 3 * 3, 32),
            nn.ReLU(True),
            nn.Linear(32, 3 * 2)
        )

        # Initialize the weights/bias with identity transformation
        self.fc_loc[2].weight.data.zero_()
        self.fc_loc[2].bias.data.copy_(torch.tensor([1, 0, 0, 0, 1, 0], dtype=torch.float))

    # Spatial transformer network forward function
    def stn(self, x):
        xs = self.localization(x)
        xs = xs.view(-1, 10 * 3 * 3)
        theta = self.fc_loc(xs)
        theta = theta.view(-1, 2, 3)

        grid = F.affine_grid(theta, x.size())
        x = F.grid_sample(x, grid)

        return x

    def forward(self, x):
        # transform the input
        x = self.stn(x)

        # Perform the usual forward pass
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


model = Net().to(device)

CNN ๋ชจ๋ธ ์ฝ”๋“œ ์ƒ์— STN์ด ์ถ”๊ฐ€๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ


OPTIMIZING VISION TRANSFORMER MODEL FOR DEPLOYMENT

0๊ฐœ์˜ ๋Œ“๊ธ€