[DL] Weak supervision, Semi-supervised learning

cha-suyeonยท2022๋…„ 3์›” 12์ผ


๐Ÿ’ป ๋”ฅ๋Ÿฌ๋‹์˜ ๊นŠ์ด ์žˆ๋Š” ์ดํ•ด๋ฅผ ์œ„ํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฐ•์˜ 11๊ฐ•
๐Ÿ“„ Various Types of Supervision in Machine Learning
๐Ÿ“„ Weak Supervision (Part I)
๐Ÿ“„ Snorkel โ€” A Weak Supervision System

Weak supervision๊ณผ Semi-supervised learning์˜ ๊ฐœ๋…์„ ์ œ๋Œ€๋กœ ์ดํ•ดํ•˜๊ณ  ์‹ถ์–ด์„œ ์œ„์˜ ๊ฐ•์˜์™€ ํฌ์ŠคํŒ…์„ ์ฐธ๊ณ ํ•˜์—ฌ ๊ณต๋ถ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ด๋ฒˆ ๊ฒŒ์‹œ๊ธ€์€ ๊ฐœ๋…๊ณผ ์šฉ์–ด๋ฅผ ์ •๋ฆฌํ•˜๋Š” ์ •๋„๋กœ ํฌ์ŠคํŒ…ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

์ œ๊ฐ€ ์ž˜๋ชป ์ดํ•ดํ•˜์—ฌ ๋‚ด์šฉ์— ์˜ค๋ฅ˜๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿด ๊ฒฝ์šฐ ๋Œ“๊ธ€๋กœ ์•Œ๋ ค์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜์˜ supervision learning ๊ธฐ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

data ๊ธฐ๋ฐ˜ machine learing model์€ label์ด ์ง€์ •๋„๋‹ˆ sample์˜ ์‚ฌ์šฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ถ„๋ฅ˜๋ฉ๋‹ˆ๋‹ค.

Supervised: the model uses a set of (x, y) for training, where x is the feature vector and y is the associated label.

์ง€๋„ ํ•™์Šต์˜ ๊ฒฝ์šฐ, ๋ชจ๋ธ์€ ํ•™์Šต์„ ์œ„ํ•ด (x,y)(x,y) set๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ์—ฌ๊ธฐ์„œ x๋Š” feature vector์ด๊ณ , y๋Š” label์ž…๋‹ˆ๋‹ค.

Unsupervised: the model uses just the feature vectors with no label information for training.

๋น„์ง€๋„ ํ•™์Šต์˜ ๊ฒฝ์šฐ, ๋ชจ๋ธ์€ ํ•™์Šต์„ ์œ„ํ•ด label ์ •๋ณด์—†์ด feature vector๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

unlabeled data์—์„œ pattern์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Semi-Supervised: a combination of labeled and unlabeled samples are used for training.

์ค€์ง€๋„ ํ•™์Šต์˜ ๊ฒฝ์šฐ, label์ด ์ง€์ •๋œ sample๊ณผ label ์ง€์ •๋˜์ง€ ์•Š์€ smaple์˜ ์กฐํ•ฉ์ด training์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์œ„์˜ supervised์™€ unsupervised learning ๊ธฐ๋ฒ•์ด ์„ž์˜€๋‹ค๊ณ  ๋ณด๋ฉด ๋ฉ๋‹ˆ๋‹ค.

์œ„ ๊ทธ๋ฆผ์€ label์„ ๋ถ™์ธ sample ์ˆ˜์— ๊ธฐ๋ฐ˜ํ•œ supervision strategies์ž…๋‹ˆ๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์€ supervised model์—์„œ ๋‚˜์˜ต๋‹ˆ๋‹ค.

์ด๋Ÿฐ ์ „ํ†ต์ ์ธ supervision์—์„œ๋Š” ์˜์‚ฌ ๊ฒฐ์ • ๊ฒฝ๊ณ„์— ๋” ๊ฐ€๊นŒ์šด data point๋ฅผ ์‹๋ณ„ํ•˜๊ณ , ๋” ๊ฐ€์น˜ ์žˆ๋Š” ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํฌ๋งํ•˜๋ฉฐ ํ•ด๋‹น data์— ๋Œ€ํ•ด domain expert์˜ ์‹œ๊ฐ„์„ ๋” ์šฐ์„ ์‹œ ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ๋‹จ์ ์€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋“ค ์ž˜ ์•„์‹œ๊ฒ ์ง€๋งŒ, label์ด ์žˆ๋Š” sample์€ ๊ตฌ์„ฑํ•˜๊ธฐ์— ๋งŽ์€ ๋น„์šฉ์ด ๋“ญ๋‹ˆ๋‹ค.

๋˜ํ•œ, labelingํ•˜๋Š” ์ž‘์—…์€ domain์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ ๋” ๋น„์šฉ์ด ๋งŽ์ด ๋“ค๋ฉฐ ์ž‘์—…์€ ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๋ณ€๊ฒฝ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ˆ˜๋™(์ธ๊ฐ„์ด ์ž‘์—…ํ•œ ๊ฒฝ์šฐ)์œผ๋กœ label์ด ์ง€์ •๋œ training data๋Š” ์ •์ ์ด๋ฉฐ, ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์— ์ ์‘ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ์ด ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด ๋ช‡ ๊ฐ€์ง€ ์ ‘๊ทผ ๋ฐฉ์‹์ด ๋‚˜์™”์Šต๋‹ˆ๋‹ค.

๊ทธ์ค‘ ์ €๋Š” Weak Supervision๊ณผ Semi-Supervised์˜ ๊ฐœ๋…์— ๋Œ€ํ•ด ์ •๋ฆฌํ•ด๋ณด๋ ค ํ•ฉ๋‹ˆ๋‹ค.

Weak supervision

์œ„ํ‚ค๋ฐฑ๊ณผ์˜ ํž˜์„ ๋นŒ๋ ค ๊ฐœ๋…์˜ ์ •์˜๋ฅผ ๊ฐ€์ ธ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค.

โ€œWeak Supervision is a branch of machine learning where noisy, limited, or imprecise sources are used to provide supervision signal for labeling large amounts of training data in a supervised learning setting.โ€

weak supervision ์€ supervised learning์—์„œ ๋งŽ์€ ์–‘์˜ training data์— label์„ ์ง€์ •ํ•˜๊ธฐ ์œ„ํ•ด ๊ทธ๋ฆฌ๊ณ  supervision signal์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด noise๊ฐ€ ์žˆ๊ฑฐ๋‚˜ ์ œํ•œ์ ์ด๊ฑฐ๋‚˜ ๋ถ€์ •ํ™•ํ•œ sources๊ฐ€ ์‚ฌ์šฉ๋˜๋Š” machine learning์˜ ํ•œ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.

Weak supervision์€ data labeling์˜ data labeling bottleneck ํ˜„์ƒ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ด๋Ÿฐ ์ ‘๊ทผ ๋ฐฉ์‹์ด ๊ฐœ๋ฐœ๋˜์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

weak supervision ์ „์ฒด data๊ฐ€ label์€ ๋‹ฌ๋ ค ์žˆ๋Š” ์ƒํƒœ์ง€๋งŒ, ์ตœ์ข… ๋ชฉํ‘œ label์˜ ์ผ๋ถ€๋ถ„๋งŒ ์ œ๊ณต๋˜๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์ž…๋‹ˆ๋‹ค.

label์ด ํ›จ์”ฌ ๋” ์ €๋ ดํ•œ ๊ฐ€๊ฒฉ์œผ๋กœ ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ์—์„œ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋ฉด ์šฐ๋ฆฌ๋Š” ๊ฑฐ๊ธฐ์„œ ๋” ๋งŽ์€ ์ •๋ณด๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋” ์‰ฝ๊ฒŒ ์–˜๊ธฐํ•˜๋ฉด ํ”„๋กœ๊ทธ๋ž˜๋ฐ์„ ํ†ตํ•ด data point์— label์„ ์ง€์ •ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

์ด ๋ฐฉ๋ฒ•์€ ์™„๋ฒฝํ•˜์ง€ ์•Š๋‹ค๊ณ  ํ•˜๋Š”๋ฐ ๊ทธ๋Ÿฌํ•œ ์ด์œ ๋กœ๋Š”

  • Domain heuristics (e.g. common patterns, rules of thumb, etc.)
  • Existing ground-truth data that is not an exact fit for the task at hand, but close enough to be useful (traditionally called โ€œdistant supervisionโ€)
  • Unreliable non-expert annotators (e.g. crowdsourcing)

๋“ฑ์ด ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

heuristics, functions, distributions, domain knowledge ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ classifier์— noise label์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

classifier๋Š” training์„ ์œ„ํ•ด ๊ฐ resource์—์„œ ์ œ๊ณตํ•˜๋Š” noiser๊ฐ€ ์žˆ๋Š” label์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  Snorkel์ด ์œ ๋ช…ํ•œ data labeling library์ž…๋‹ˆ๋‹ค.

Semi supervision

๊ทธ๋ ‡๋‹ค๋ฉด semi supervision์€ ๋ฌด์—‡์ผ๊นŒ์š”?

semi supervision์€ supervised learning๊ณผ unsupervised learing์ด ์•ฝ๊ฐ„ mix๋˜์–ด ์žˆ๋Š” ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค.

unlabeled data(label ์—†์ด ๋‹จ์ˆœํžˆ feature๋งŒ ์กด์žฌํ•˜๋Š” data)๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

supervised learning์—์„œ ํ™œ์šฉํ•œ labeled data, unlabeled new data๊นŒ์ง€ ๋ชจ๋‘ ํ•™์Šต์— ํ™œ์šฉํ•˜๋Š” machine learing ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.

unlabeled data์˜ ๊ฒฝ์šฐ ์ธํ„ฐ๋„ท์„ ํ†ตํ•ด ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ semi supervision์—์„œ ํšจ์œจ์ ์œผ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๊ฒฝ์šฐ ์ผ๋ถ€๋ถ„๋งŒ label์ด ์žˆ๋Š” data๋ฅผ supervised learning์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ธฐ๋ณธ์ ์œผ๋กœ label์ด ์—†๋Š” data๋Š” ํ•™์Šต์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— label์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์ธ ๊ทน์†Œ์ˆ˜์˜ ๋ฐ์ดํ„ฐ๋งŒ ์‹ค์ œ ํ•™์Šต์— ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ semi-supervised learning์˜ ๊ฒฝ์šฐ, ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐ์˜ ์–‘ ์ž์ฒด๊ฐ€ ํ›จ์”ฌ ๋” ๋งŽ์•„์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐฉ๋ฒ•

์˜ˆ๋ฅผ ๋“ค์–ด, ๋Œ“๊ธ€ 10,000๊ฐœ๋ฅผ ๋ชจ์•„ ๊ธ์ •/๋ถ€์ • ๊ฐ์ • ๋ถ„์„์„ ์ง„ํ–‰ํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์ด์ „์— ์ˆ˜๋™์œผ๋กœ label(positive/negative)์„ ํ• ๋‹นํ•œ ๋ฌธ์žฅ์ด 50๊ฐœ๋ฟ์ž…๋‹ˆ๋‹ค.

๋‚˜๋จธ์ง€ data์— label์„ ์ง€์ •ํ•˜๋Š” ๋Œ€์‹ , ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

  1. 50๊ฐœ์˜ label์ด ์ง€์ •๋œ ์˜ˆ์ œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ supervised model์„ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค.

    ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ sample์˜ ์ˆ˜๊ฐ€ ์ ๊ธฐ ๋•Œ๋ฌธ์— model์˜ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  2. label์ด ์ง€์ •๋˜์ง€ ์•Š์€ data๋กœ unsupervised model์„ ๊ตฌ์ถ•ํ•˜์—ฌ ์œ„์˜ sample์„ ๋‘ ๊ฐœ์˜ cluster๋กœ ๊ทธ๋ฃนํ™”ํ•ฉ๋‹ˆ๋‹ค.

    ๋ฐ์ดํ„ฐ๋Š” ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ž‘์€ cluster๋ฅผ ํ˜•์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‘ ๊ทธ๋ฃน์œผ๋กœ ๊ฐ•์ œํ•  ๊ฒฝ์šฐ ์˜๋„ํ–ˆ๋˜ positive/negative๋กœ๋งŒ ๋ถ„๋ฅ˜๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  3. label์ด ์ง€์ •๋œ data์™€ label์ด ์ง€์ •๋˜์ง€ ์•Š์€ ๋ชจ๋“  data๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ semi-supervised model์„ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค.

    ๊ทธ๋Ÿฌ๋ฉด 50๊ฐœ์˜ ์˜ˆ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‚˜๋จธ์ง€ data์— label์„ ์ง€์ •ํ•˜๊ณ , supervision sentiment prediction model์„ ๊ตฌ์ถ•ํ•  ๋•Œ ๋” ํฐ ๋ฐ์ดํ„ฐ์…‹์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๋งŽ์€ ํ•™์Šต ์œ ํ˜•์ด ์žˆ์ง€๋งŒ ๊ทธ์ค‘ ํ•˜๋‚˜์˜ ์˜ˆ์‹œ๋ฅผ ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

์ด๋ ‡๊ฒŒ ๊ฐœ๋…์˜ ์ •์˜๋งŒ ๋ณด๊ณ ๋Š” ์‚ฌ์‹ค ์•„์ง ์ •ํ™•ํžˆ ์ดํ•ดํ•˜๊ธฐ ํž˜๋“  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ €๋Š” weak supervision๊ณผ semi supervision์— ๋Œ€ํ•œ ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ณด๋ฉฐ ์กฐ๊ธˆ ๋” ์•Œ์•„๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ๊ทธ๋ฆผ์„ ํ†ตํ•ด

  • sueprvised learning
  • semi-supervised learning
  • unsupervised learning

์˜ ๊ฐœ๋…์— ๋Œ€ํ•ด ์ดํ•ดํ•˜๋Š”๋ฐ ๋„์›€์ด ๋˜์…จ์œผ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€ ์ถœ์ฒ˜

๋ฏธ๋‚จ์ด ๊ท€์—ฝ์ฃ 

0๊ฐœ์˜ ๋Œ“๊ธ€