[CV] Understanding RoIs(Region of Interest)

๊ฐ•๋™์—ฐยท2022๋…„ 1์›” 25์ผ
0

[Paper review]

๋ชฉ๋ก ๋ณด๊ธฐ
7/17

๐Ÿ‘จโ€๐Ÿซ ๋ณธ ๋ฆฌ๋ทฐ๋Š” Kemal Erdem๋‹˜์˜ ๋ธ”๋กœ๊ทธ์„ ๋ณด๊ณ  ์ž‘์„ฑํ•œ ๋ฆฌ๋ทฐ์ž…๋‹ˆ๋‹ค.

๐Ÿ“Œ Before review

ย  ๋ณธ ๊ธ€์„ ์“ฐ๊ฒŒ ๋œ ์ด์œ ๋Š” MASK R-CNN ๋…ผ๋ฌธ์„ ์ฝ๋Š” ์ค‘ RoI์— ๋Œ€ํ•œ ๊ธ€์ด ๋‚˜์˜ค๋Š”๋ฐ, ์ด ๋ถ€๋ถ„์— ๋Œ€ํ•ด ์ฐพ์•„๋ณด๋‹ค ๋„ˆ๋ฌด๋‚˜๋„ ์ข‹์€ ๊ธ€์ด ์žˆ์–ด ๋ฆฌ๋ทฐํ•ฉ๋‹ˆ๋‹ค. ํ•ญ์ƒ CV ๋…ผ๋ฌธ์„ ์ฝ๋‹ค๋ณด๋ฉด ๋…ผ๋ฌธ๋งŒ์„ ์ฐธ๊ณ ํ•ด ์ดํ•ดํ•˜๊ธฐ ์–ด๋ ค์›Œ ๋‹ค์–‘ํ•œ ์ž๋ฃŒ๋ฅผ ์ฐพ์•„๋ณผ ๋•Œ๊ฐ€ ๋Œ€๋ถ€๋ถ„ ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ฐœ์ธ์ ์ธ ์ƒ๊ฐ์ผ ์ˆ˜ ์žˆ์ง€๋งŒ, ๋…ผ๋ฌธ๋งŒ์„ ์ฝ๊ณ  ์ดํ•ดํ•˜๊ธฐ ์–ด๋ ค์šด ์ด๋ก ๋“ค๋„ ๋งŽ๋‹ค๊ณ  ์ƒ๊ฐ๋ฉ๋‹ˆ๋‹ค..ใ… 
ย  ๋ณธ๋ก ์œผ๋กœ ๋“ค์–ด๊ฐ€ RoI๋Š” Fast R-CNN์—์„œ ๋‚˜์˜จ ๊ฐœ๋…์ž…๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์„ ์ฝ์œผ๋ฉด์„œ ์–ด๋Š์ •๋„ ์ดํ•ด๊ฐ€ ๋˜์—ˆ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ, ๋ณธ ๊ธ€์„ ์ฝ์œผ๋ฉด์„œ "์™œ ์ด๋Ÿฐ ์˜๋ฌธ์„ ๊ฐ€์ง€์ง€ ๋ชปํ–ˆ์„๊นŒ" ๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

RoI๋ž€?

๐ŸŽˆ RoI๋ž€ "์›๋ณธ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ proposed ๋œ region์ด๋‹ค."๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Region of Interest๋ฅผ ์ง์—ญํ•˜๋ฉด ๋ง ๊ทธ๋Œ€๋กœ ํฅ๋ฏธ๋กœ์šด ์ง€์—ญ์„ ์ฐพ๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ํ• ๋ ค๊ณ ํ•˜๋Š” task๋Š” ๋Œ€๋ถ€๋ถ„ detection์ด๋‚˜ segmentation์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ฏธ์ง€์—์„œ ํฅ๋ฏธ๋กœ์šด๋ถ€๋ถ„ ์ฆ‰, ๋ฌผ์ฒด๊ฐ€ ์žˆ๋Š” ๋ถ€๋ถ„์„ ์ฐพ์•„์•ผํ•ฉ๋‹ˆ๋‹ค.

Feature extraction

๐ŸŽˆ RoI๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด "Fast R-CNN"์—์„œ๋Š” feature map์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.(VGG16์„ ์˜ˆ์‹œ๋กœ ๋“ค๋ฉด)

๐ŸŽˆ ์œ„์˜ ์‚ฌ์ง„์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด 512x512x3์„ Input์œผ๋กœ ๋„ฃ๊ณ  16x16x512์ธ feature map๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. feature map์˜ ์‚ฌ์ด์ฆˆ๋Š” input์‚ฌ์ด์ฆˆ๋ฅผ 32๋กœ ๋‚˜๋ˆˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์ด๋ฏธ์ง€์˜ ์ •๋ณด๋ฅผ 16x16x512๋กœ ์••์ถ•์‹œ์ผฐ๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๊ฐ๊ฐ์˜ ๊ณ ์ˆ˜์ค€์˜ ์ •๋ณด๋“ค์„ ๋‹ด๊ณ  ์žˆ๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ### Sample RoIs

๐ŸŽˆ ์œ„์˜ ์‚ฌ์ง„์˜ 4๊ฐœ์˜ RoI๋“ค์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ Fast R-CNN์—์„œ๋Š” ์ˆ˜์ฒœ๊ฐœ์˜ RoI๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ ์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ๊ฒƒ์€ RoI๋“ค์€ bounding box๋ฅผ ์˜๋ฏธํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค. ๊ฒ‰์œผ๋กœ๋Š” ๊ทธ๋Ÿด ์ˆ˜ ์žˆ์ง€๋งŒ , ROI๋Š” ์ถ”๊ฐ€์ ์ธ processing์„ ํ•˜๊ธฐ ์œ„ํ•ด interest๋ฅผ ์ œ์•ˆํ•ด์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.ํ•˜์ง€๋งŒ ๋Œ€๋ถ€๋ถ„์˜ ๊ธ€์ด๋‚˜ ๋ธ”๋กœ๊ทธ์—์„œ ํŽธ์˜์ƒ RoI๋ฅผ ์œ„์™€ ๊ฐ™์ด ๋ณด์—ฌ์ฃผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

How to get RoIs from the feature map?

๐ŸŽˆ ์ผ๋ฐ˜์ ์œผ๋กœ RoI๋ฅผ ์ฐพ์•˜๋‹ค๋ฉด, ๊ทธ๊ฒƒ๋“ค feature map์— ๋งคํ•‘ ํ•  ์ˆ˜ ์žˆ์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ ๋ชจ๋“  RoI๋Š” ์›๋ž˜ ์ขŒํ‘œ์™€ ์‚ฌ์ด์ฆˆ๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋ถ€ํ„ฐ ์ œ๊ฐ€ ์ด ๊ธ€์„ ์“ฐ๊ฒŒ๋œ ์ด์œ  ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.

๐ŸŽˆ ์œ„์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ์‚ฌ์ด์ฆˆ๋Š” 145x200์ด๋ฉฐ, top-left ์ขŒํ‘œ๋Š” (192,296)์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์˜๋ฌธ์ด ํ•˜๋‚˜ ์ƒ๊น๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ feature map์€ 16x16 ์ด๋ฉด์„œ, 32๋กœ ๋‚˜๋ˆ ๋–จ์–ด์ง€๋Š” ์ˆ˜์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์œ„์˜ ๊ฒฝ์šฐ์—๋Š” ์ž์—ฐ์ˆ˜๋กœ ๋–จ์–ด์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋Ÿด ๊ฒฝ์šฐ์—๋Š” ์–ด๋–ป๊ฒŒ ํ•ด์•ผํ• ๊นŒ์š”? ์ €๋Š” Fast, Faster R-CNN์„ ์ฝ์„ ๋•Œ๋Š” ์ด๋Ÿฐ ์˜๋ฌธ์„ ๊ฐ€์ง€์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์–ด๋–ป๊ฒŒ ๋ณด๋ฉด ๋‹น์—ฐํžˆ ๊ถ๊ธˆํ•ด์•ผํ•˜๋Š” ์งˆ๋ฌธ์ž…๋‹ˆ๋‹ค๋งŒ..

Quantization of coordinates one the feature map

๐ŸŽˆ Quantization is a process of constraining an input from a large set of values (like real numbers) to a discrete set (like integers) ๋ผ๊ณ  ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์‹ค MASK R-CNN์—์„œ Quantization๋ผ๋Š” ๋‹จ์–ด๋ฅผ ์ฒ˜์Œ ๋ดค๋˜๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค.(Fast R-CNN์—์„œ๋Š” ๋ชป๋ดค๋˜๊ฑธ๋กœ ๊ธฐ์–ตํ•ฉ๋‹ˆ๋‹ค๋งŒ..)

๐ŸŽˆ ์œ„์˜ ๋นจ๊ฐ„์ƒ‰ ๋ฐ•์Šค๊ฐ€ ๊ธฐ์กด์˜ RoI ๋ฐ•์Šค์ž…๋‹ˆ๋‹ค. ์‚ฌ์ง„์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ๊ฐ๊ฐ์˜ ๊ฐ’์ด ๋‚˜๋ˆ  ๋–จ์–ด์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ์— ์†Œ์ˆ˜์ ์„ ๋ฒ„๋ ค ์ž์—ฐ์ˆ˜๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค.(6.25 -> 6, 4.53 -> 4) ์œ„์˜ ์ฃผํ™ฉ์ƒ‰ ๋ถ€๋ถ„์ด ์ˆ˜์ •๋œ ๊ฐ’๋“ค์˜ ๋ฒ”์œ„์ž…๋‹ˆ๋‹ค. ๐ŸŽˆ ๊ฒฐ๊ณผ์ ์œผ๋กœ ํŒŒ๋ž€์ƒ‰ ๋ถ€๋ถ„์˜ ์ •๋ณด๋ฅผ ์žƒ๊ฒŒ๋˜๊ณ  ์ดˆ๋ก์ƒ‰ ๋ถ€๋ถ„์—์„œ ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ์–ป๊ฒŒ๋ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์œ„์˜ ์ œ์•ฝ์œผ๋กœ ์ธํ•ด์„œ ์›๋ž˜์˜ RoI๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ถ€๋ถ„์€ ํ›„์— RoIAlign์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

RoI Pooling

๐ŸŽˆ ์ดํ›„ Fast R-CNN์—์„œ๋Š” RoI Pooling์„ ์ง„ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. RoI Pooling์ด๋ž€ ๋ง ๊ทธ๋Œ€๋กœ RoI๋“ค์„ Poolingํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Pooling์„ ์ง„ํ–‰ํ•˜๋Š” ์ด์œ ๋Š” ์œ„์˜ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด FC layers๋ฅผ ์œ„ํ•ด ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ feature๋“ค์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ RoI๋“ค์€ ๊ฐ๊ฐ ๋‹ค๋ฅธ ์‚ฌ์ด์ฆˆ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ Pooling์„ ํ†ตํ•ด์„œ ๊ฐ™์€ ์‚ฌ์ด์ฆˆ๋กœ ๋ฐ”๊ฟ”์ค๋‹ˆ๋‹ค.

๐ŸŽˆ ์œ„์˜ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด 4x6์˜ RoI๋ฅผ 3x3์˜ ์‚ฌ์ด์ฆˆ๋กœ ๋ฐ”๊ฟ”์ค˜์•ผํ•ฉ๋‹ˆ๋‹ค. 4x6์„ 3x3์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๊ธฐ ์œ„ํ•ด์„  1x2 vector๋ฅผ ์‚ฌ์šฉํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.(4/3 = 1 x 6/3 = 2) ๊ทธ๋ ‡๊ฒŒ ๋œ๋‹ค๋ฉด ์œ„์˜ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด ๋˜ ๋‹ค์‹œ ๋งˆ์ง€๋ง‰ ํ–‰์˜ ์ •๋ณด๋ฅผ ์žƒ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ ๊ฐ๊ฐ์˜ ๋ชจ๋“  RoI๋“ค์— ๋Œ€ํ•ด Pooling์„ ์ง„ํ–‰ํ•˜๊ฒŒ๋œ๋‹ค๋ฉด ์ˆ˜์ฒœ๊ฐœ์˜ 3x3x512์˜ feature map๋“ค์ด ์ถ”์ถœ๋ฉ๋‹ˆ๋‹ค.

๐Ÿ“Œ ๊ธฐ์กด์˜ RoI, RoIPooling์˜ ๊ฒฝ์šฐ์—๋Š” ๋ณด์‹œ๋‹ค ์‹ถ์ด ์ผ๋ถ€ ์ •๋ณด๋ฅผ ์†์‹คํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์œ„์™€ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด RoIAlign์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์ด ๋‚˜์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

RoI Align

๐Ÿ“Œ ์œ„์˜ ์‚ฌ์ง„์€ MASK R-CNN์˜ testing network์ž…๋‹ˆ๋‹ค. RoI Align์€ mask R-CNN์—์„œ ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์•ž์—์„œ ์–ธ๊ธ‰ํ–ˆ๋‹ค ์‹ถ์ด RoI Align ๋ฐฉ๋ฒ•์€ ์œ„์˜ ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•์˜ ๋ฌธ์ œ์ (์ •๋ณด์˜ ์†์‹ค)์„ ํ•ด๊ฒฐํ•ด์ฃผ๋Š” ๋ฐฉ์•ˆ์ž…๋‹ˆ๋‹ค.

๐Ÿ“Œ MASK R-CNN์€ Instance segmentation์˜ ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. segmentation์„ ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— pixel๊ฐ„์˜ ๊ด€๊ณ„๊ฐ€ ๋”์šฑ ์ค‘์š”์‹œ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ์— ์ •๋ณด์˜ ์†์‹ค์„ ์—†์• ๋Š” RoI Align ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“Œ RoI Align๋Š” quantization์„ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ฆ‰, RoI ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด Fast R-CNN์—์„œ๋Š” 2๋ฒˆ์˜ quantization์„ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, 2๋ฒˆ์˜ ์ •๋ณด์†์‹ค์ด ์ผ์–ด๋‚ฉ๋‹ˆ๋‹ค. RoI Align quantization์„ ํ•˜์ง€ ์•Š์œผ๋ฉด์„œ, ์ •๋ณด์˜ ์†์‹ค์„ ๋ง‰์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐Ÿ“Œ RoI์˜ ๋ฒ”์œ„๋ฅผ 3x3 ์‚ฌ์ด์ฆˆ์— ๋งž์ถฐ, width, height์„ 3๋“ฑ๋ถ„ ํ•ฉ๋‹ˆ๋‹ค. RoI Align ์—ญ์‹œ RoI pooling์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค(3x3). ์œ„์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด RoI๊ฐ€ ๊ฐ ์…€์— ์ •ํ™•ํžˆ ์ผ์น˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๐Ÿ“Œ RoI pooling layer๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ๊ฐ์˜ 4๊ฐœ์˜ sampling points๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ“Œ ๊ฐ๊ฐ์˜ ์ขŒํ‘œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  • X=Xbox+(width/3)โˆ—1=9.94X = X_box + (width/3) * 1 = 9.94 (top-left points)
  • Y=Ybox+(height/3)โˆ—1=6.50Y = Y_box + (height/3) * 1 = 6.50 (top-left points)
  • X=Xbox+(width/3)โˆ—1=9.94X = X_box + (width/3) * 1 = 9.94 (top-left points)
  • Y=Ybox+(height/3)โˆ—2=7.01Y = Y_box + (height/3) * 2 = 7.01 (top-left points)

๐Ÿ“Œ ์šฐ๋ฆฌ๋Š” ์ด์ œ bilinear interpolation์„ ์‚ฌ์šฉํ•ด ๊ฐ๊ฐ์˜ ๊ฐ’์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“Œ ์œ„์˜ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด ์ฒซ๋ฒˆ์งธ ํฌ์ธํŠธ ๋ฐ•์Šค๋ฅผ ๋ณด์‹œ๋ฉด, (9.94, 6.50)์˜ ์ขŒํ‘œ์—์„œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด top-left ๋ฐฉํ–ฅ์˜ ์…€์˜ ์ค‘๊ฐ„ ์ขŒํ‘œ๋Š” (9.50, 6.50), bottom-left ๋ฐฉํ–ฅ์˜ ์…€์˜ ์ค‘๊ฐ„ ์ขŒํ‘œ๋Š” (9.50, 7.50), ์œ„์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณด๋ฉด ๊ฐ๊ฐ (10.50, 6.50), (10.50, 7.50) ์ขŒํ‘œ๋“ค์ด ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ขŒํ‘œ์ž…๋‹ˆ๋‹ค. ๐Ÿ“Œ ์œ„์˜ ์ขŒํ‘œ๋“ค๋กœ bilinear interpolation์„ ๊ณ„์‚ฐํ•˜๋ฉด ๊ฐ ์ง€์ ์˜ ๊ฐ’์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐Ÿ“Œ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ 4๊ฐœ์ ์„ ๋ชจ๋‘ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ 4๊ฐœ์˜ ๊ฐ’์„ ๊ฐ€์ง€๊ณ  Max pooling(Avg pooling) ์ง„ํ–‰ํ•ด 3x3 feature map์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐Ÿ“Œ ๋‚˜๋จธ์ง€๋Š” ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•์ด๋ž‘ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ RoI Align๋Š” quantization ์—†์ด pooling์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰ ์ •๋ณด์˜ ์†์‹ค์—†์ด Pooling์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
profile
Maybe I will be an AI Engineer?

0๊ฐœ์˜ ๋Œ“๊ธ€