[NLP] CS224N 21๊ฐ• ์ •๋ฆฌ [Hugging Face Tutorial๐Ÿค—]

๊น€์„ฑ์œค(Jack)ยท2025๋…„ 9์›” 5์ผ

NLP

๋ชฉ๋ก ๋ณด๊ธฐ
23/35

๊ฐ•์˜ ์† ์˜ˆ์ œ ์ฝ”๋“œ โ‡’\Rarr https://colab.research.google.com/drive/13r94i6Fh4oYf-eJRSi7S_y_cen5NYkBm#scrollTo=OTsW-Wwi-X81

1. Hugging Face ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์†Œ๊ฐœ ๋ฐ ๊ธฐ๋ณธ ์„ค์ •

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์†Œ๊ฐœ

  • Hugging Face๋Š” ํŠธ๋žœ์Šคํฌ๋จธ(Transformer) ๊ธฐ๋ฐ˜์˜ ์ตœ์‹  NLP ๋ชจ๋ธ๋“ค์„ ์•„์ฃผ ์‰ฝ๊ณ  ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋„์™€์ฃผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.
  • ํŠนํžˆ PyTorch์™€ ์™„๋ฒฝํ•˜๊ฒŒ ํ˜ธํ™˜๋˜์–ด, ๋ชจ๋ธ์˜ ํ•™์Šต ๋ฐ ํ™œ์šฉ ๊ณผ์ •์„ ๋งค์šฐ ํŽธ๋ฆฌํ•˜๊ฒŒ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค.
  • Hugging Face Hub์—๋Š” BERT, GPT ๋“ฑ ์ˆ˜๋งŽ์€ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ(pre-trained) ๋ชจ๋ธ๋“ค์ด ๊ณต์œ ๋˜๊ณ  ์žˆ์–ด, ์›ํ•˜๋Š” ์ž‘์—…์„ ์œ„ํ•œ ๋ชจ๋ธ์„ ์†์‰ฝ๊ฒŒ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์„ค์น˜

  • Hugging Face ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ํ•„์ˆ˜์ ์ธ ๋‘ ๊ฐ€์ง€ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
    • transformers: ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ(BERT, GPT-2 ๋“ฑ)๊ณผ ํ† ํฌ๋‚˜์ด์ €(Tokenizer)๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
    • datasets: ๋ชจ๋ธ ํ•™์Šต ๋ฐ ํ‰๊ฐ€์— ํ•„์š”ํ•œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์‰ฝ๊ฒŒ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค.
!pip install transformers datasets

2. Hugging Face ๋ชจ๋ธ ์‚ฌ์šฉ์˜ 3๋‹จ๊ณ„

Hugging Face ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ณผ์ •์€ ํฌ๊ฒŒ ์„ธ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

1๋‹จ๊ณ„: ์›ํ•˜๋Š” ๋ชจ๋ธ ์ฐพ๊ธฐ

  • Hugging Face Hub๋Š” ๋‹ค์–‘ํ•œ ๋ชจ๋ธ๋“ค์ด ๋ชจ์—ฌ์žˆ๋Š” ๊ฑฐ๋Œ€ํ•œ ์ €์žฅ์†Œ์ž…๋‹ˆ๋‹ค.
  • Zero-shot Classification, Text Generation ๋“ฑ ํŠน์ • NLP ์ž‘์—…(Task)์— ํŠนํ™”๋œ ๋ชจ๋ธ์„ ๊ฒ€์ƒ‰ํ•˜์—ฌ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ฐ•์˜์—์„œ๋Š” ์˜ˆ์‹œ๋กœ distilbert-base-uncased-finetuned-sst-2-english ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

2๋‹จ๊ณ„: ํ† ํฌ๋‚˜์ด์ € ๋ฐ ๋ชจ๋ธ ๋กœ๋“œ

  • **ํ† ํฌ๋‚˜์ด์ €(Tokenizer)**๋Š” ์šฐ๋ฆฌ๊ฐ€ ์ž…๋ ฅํ•˜๋Š” ๋ฌธ์žฅ(raw text)์„ ๋ชจ๋ธ์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์ˆซ์ž ํ˜•ํƒœ์˜ ํ† ํฐ(token)์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • **AutoTokenizer**๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด, ํŠน์ • ๋ชจ๋ธ ์ด๋ฆ„(from_pretrained)๋งŒ ์ง€์ •ํ•ด์ฃผ๋ฉด ํ•ด๋‹น ๋ชจ๋ธ์— ๋งž๋Š” ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์ž๋™์œผ๋กœ ๋ถˆ๋Ÿฌ์™€ ๋งค์šฐ ํŽธ๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ ์—ญ์‹œ AutoModelForSequenceClassification ๊ณผ ๊ฐ™์€ ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์ง€์ •๋œ ์ด๋ฆ„์˜ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
from transformers import AutoTokenizer, AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

3๋‹จ๊ณ„: ์ž…๋ ฅ ์ฒ˜๋ฆฌ ๋ฐ ์˜ˆ์ธก ์ˆ˜ํ–‰

  • ๋กœ๋“œ๋œ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์‚ฌ์šฉํ•ด ์ž…๋ ฅ ๋ฌธ์žฅ์„ ํ† ํฐํ™”ํ•˜๊ณ , ์ด๋ฅผ ๋ชจ๋ธ์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ์€ ์ž…๋ ฅ๋œ ํ† ํฐ์„ ๋ฐ”ํƒ•์œผ๋กœ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ๊ฒฐ๊ณผ๋กœ **๋กœ์ง“(logits)**์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ๋กœ์ง“๊ฐ’์— ์†Œํ”„ํŠธ๋งฅ์Šค(Softmax) ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋ฉด ๊ฐ ํด๋ž˜์Šค(label)์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ์–ป์„ ์ˆ˜ ์žˆ๊ณ , **argmax**๋ฅผ ํ†ตํ•ด ์ตœ์ข… ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
import torch

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]

# ํ† ํฌ๋‚˜์ด์ €๋กœ ๋ฌธ์žฅ ํ† ํฐํ™”
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")

# ๋ชจ๋ธ์— ์ž…๋ ฅ ์ „๋‹ฌํ•˜์—ฌ ์˜ˆ์ธก ์ˆ˜ํ–‰
outputs = model(**inputs)
logits = outputs.logits

# ๊ฒฐ๊ณผ๋ฅผ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜
predictions = torch.nn.functional.softmax(logits, dim=-1)
print(predictions)

# ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ํด๋ž˜์Šค(label) ํ™•์ธ
predicted_labels = torch.argmax(predictions, dim=1)
print(predicted_labels)

# label id๋ฅผ ์‹ค์ œ ์ด๋ฆ„์œผ๋กœ ๋ณ€ํ™˜
print([model.config.id2label[label_id] for label_id in predicted_labels.tolist()])

3. ํ† ํฌ๋‚˜์ด์ €(Tokenizer) ์‹ฌ์ธต ํƒ๊ตฌ

ํ† ํฌ๋‚˜์ด์ €๋Š” ๋ชจ๋ธ์˜ ์ž…๋ ฅ์„ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ํ•ต์‹ฌ์ ์ธ ์š”์†Œ์ž…๋‹ˆ๋‹ค.

ํ† ํฌ๋‚˜์ด์ €์˜ ์—ญํ• 

  • ๋ชฉ์ : ์ž์—ฐ์–ด ํ…์ŠคํŠธ(๋ฌธ์ž์—ด)๋ฅผ ๋ชจ๋ธ์ด ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์ˆซ์ž ID์˜ ์‹œํ€€์Šค๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์ฃผ์š” ๊ธฐ๋Šฅ:
    • ๋ฌธ์žฅ์„ ๋‹จ์–ด ๋˜๋Š” ์„œ๋ธŒ์›Œ๋“œ(subword) ๋‹จ์œ„์˜ **ํ† ํฐ(token)**์œผ๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.
    • ๊ฐ ํ† ํฐ์„ ๊ณ ์œ ํ•œ **์ˆซ์ž ID(input_ids)**๋กœ ๋งคํ•‘ํ•ฉ๋‹ˆ๋‹ค.
    • ๋ชจ๋ธ์ด ์‹ค์ œ ๋‹จ์–ด์™€ ํŒจ๋”ฉ(padding)์„ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋„๋ก **์–ดํ…์…˜ ๋งˆ์Šคํฌ(attention_mask)**๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • ๋ชจ๋ธ์˜ ์ข…๋ฅ˜์— ๋”ฐ๋ผ ๋ฌธ์žฅ์˜ ์‹œ์ž‘([CLS])์ด๋‚˜ ๋([SEP])์„ ์•Œ๋ฆฌ๋Š” ํŠน์ˆ˜ ํ† ํฐ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

AutoTokenizer์˜ ์žฅ์ 

  • ๋ชจ๋ธ๋งˆ๋‹ค ์‚ฌ์šฉํ•˜๋Š” ํ† ํฌ๋‚˜์ด์ €๊ฐ€ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ฅผ ์ง์ ‘ ๊ด€๋ฆฌํ•˜๋Š” ๊ฒƒ์€ ๋ฒˆ๊ฑฐ๋กœ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • AutoTokenizer๋Š” from_pretrained()์— ๋ชจ๋ธ ์ด๋ฆ„๋งŒ ์ „๋‹ฌํ•˜๋ฉด, ํ•ด๋‹น ๋ชจ๋ธ์— ๋งž๋Š” ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์ž๋™์œผ๋กœ ์ฐพ์•„ ๋กœ๋“œํ•ด์ฃผ๋ฏ€๋กœ ์‹ค์ˆ˜๋ฅผ ์ค„์ด๊ณ  ํŽธ์˜์„ฑ์„ ๋†’์—ฌ์ค๋‹ˆ๋‹ค.
  • ๋‚ด๋ถ€์ ์œผ๋กœ๋Š” Python ๊ธฐ๋ฐ˜์˜ ํ† ํฌ๋‚˜์ด์ €์™€ Rust๋กœ ๊ตฌํ˜„๋œ Fast Tokenizer๊ฐ€ ์žˆ์œผ๋ฉฐ, ๋ณดํ†ต ๋” ๋น ๋ฅธ Fast Tokenizer๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

ํ† ํฌ๋‚˜์ด์ € ํ™œ์šฉ๋ฒ•

  • ๊ธฐ๋ณธ ์‚ฌ์šฉ: ํ† ํฌ๋‚˜์ด์ €์— ๋ฌธ์žฅ์„ ์ž…๋ ฅํ•˜๋ฉด input_ids์™€ attention_mask๊ฐ€ ํฌํ•จ๋œ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์ฃผ์š” ์˜ต์…˜:
    • return_tensors='pt': ๊ฒฐ๊ณผ๋ฅผ PyTorch ํ…์„œ ํ˜•ํƒœ๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    • padding=True: ๋ฐฐ์น˜(batch) ๋‚ด์—์„œ ๋ฌธ์žฅ ๊ธธ์ด๋ฅผ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ๊ฐ€์žฅ ๊ธด ๋ฌธ์žฅ์„ ๊ธฐ์ค€์œผ๋กœ ๋‚˜๋จธ์ง€ ๋ฌธ์žฅ ๋’ค์— ํŒจ๋”ฉ ํ† ํฐ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    • truncation=True: ๋ชจ๋ธ์ด ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ๋Œ€ ๊ธธ์ด๋ฅผ ์ดˆ๊ณผํ•˜๋Š” ๋ฌธ์žฅ์„ ์ž˜๋ผ๋ƒ…๋‹ˆ๋‹ค.
  • ๋””์ฝ”๋”ฉ: batch_decode ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด, ๋ชจ๋ธ์˜ ์ž…๋ ฅ(input_ids)์„ ๋‹ค์‹œ ์›๋ž˜์˜ ๋ฌธ์ž์—ด๋กœ ๋ณต์›ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. skip_special_tokens=True ์˜ต์…˜์œผ๋กœ ํŠน์ˆ˜ ํ† ํฐ์„ ์ œ์™ธํ•˜๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
# ํ† ํฐํ™” ๊ณผ์ • ํ™•์ธ
sequence = "Hugging Face Transformers is great!"
tokenized_output = tokenizer(sequence)
print(tokenized_output)
# {'input_ids': [101, 10372, 12111, 22558, 2003, 2307, 999, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1]}

# input_ids๋ฅผ ๋‹ค์‹œ ํ† ํฐ์œผ๋กœ ๋ณ€ํ™˜
tokens = tokenizer.convert_ids_to_tokens(tokenized_output['input_ids'])
print(tokens)
# ['[CLS]', 'hugging', 'face', 'transformers', 'is', 'great', '!', '[SEP]']

# ๋””์ฝ”๋”ฉ
decoded_string = tokenizer.decode(tokenized_output['input_ids'], skip_special_tokens=True)
print(decoded_string)
# Hugging Face Transformers is great!

4. Hugging Face ๋ชจ๋ธ(Model) ์‹ฌ์ธต ํƒ๊ตฌ

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์˜ ์ข…๋ฅ˜

  • ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์€ ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€ ์•„ํ‚คํ…์ฒ˜๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.
    • ์ธ์ฝ”๋”(Encoder) ๋ชจ๋ธ: ๋ฌธ์žฅ์˜ ์ „์ฒด์ ์ธ ์˜๋ฏธ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ํŠนํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. (์˜ˆ: BERT, RoBERTa). ๋ฌธ์žฅ ๋ถ„๋ฅ˜, ๊ฐœ์ฒด๋ช… ์ธ์‹๊ณผ ๊ฐ™์€ ๊ณผ์ œ์— ์ฃผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ๋””์ฝ”๋”(Decoder) ๋ชจ๋ธ: ์ด์ „ ๋‹จ์–ด๋“ค์„ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ํŠนํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. (์˜ˆ: GPT-2). ํ…์ŠคํŠธ ์ƒ์„ฑ ๊ณผ์ œ์— ์ฃผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ์ธ์ฝ”๋”-๋””์ฝ”๋”(Encoder-Decoder) ๋ชจ๋ธ: ์ž…๋ ฅ ๋ฌธ์žฅ์˜ ์˜๋ฏธ๋ฅผ ์ดํ•ดํ•˜๊ณ (์ธ์ฝ”๋”), ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ƒˆ๋กœ์šด ๋ฌธ์žฅ์„ ์ƒ์„ฑ(๋””์ฝ”๋”)ํ•ฉ๋‹ˆ๋‹ค. (์˜ˆ: BART, T5). ๋ฒˆ์—ญ, ์š”์•ฝ ๊ณผ์ œ์— ์ฃผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

AutoModel์˜ ์žฅ์ 

  • AutoTokenizer์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, AutoModel ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ํŠน์ • ์ž‘์—…์— ๋งž๋Š” ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ž๋™์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์–ด ํŽธ๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
  • ์˜ˆ๋ฅผ ๋“ค์–ด, AutoModelForSequenceClassification์€ ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ์ž‘์—…์— ๋งž๋Š” ์ธ์ฝ”๋” ๋ชจ๋ธ์˜ ํ—ค๋“œ(head)๊ฐ€ ์ถ”๊ฐ€๋œ ํ˜•ํƒœ๋กœ ๋ชจ๋ธ์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ์˜ ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ

  • ์ž…๋ ฅ ์ „๋‹ฌ: ํ† ํฌ๋‚˜์ด์ €๊ฐ€ ๋ฐ˜ํ™˜ํ•œ ๋”•์…”๋„ˆ๋ฆฌ๋Š” ** (dictionary unpacking) ๋ฌธ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ model(**model_inputs)์™€ ๊ฐ™์ด ๊ฐ„๊ฒฐํ•˜๊ฒŒ ๋ชจ๋ธ์— ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ถœ๋ ฅ ๋ถ„์„:
    • ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ **๋กœ์ง“(logits)**์„ ํฌํ•จํ•˜๋Š” ๊ฐ์ฒด๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ์ž…๋ ฅ์— labels๋ฅผ ํ•จ๊ป˜ ์ „๋‹ฌํ•˜๋ฉด, ๋ชจ๋ธ์ด ์ž๋™์œผ๋กœ **์†์‹ค(loss)**์„ ๊ณ„์‚ฐํ•˜์—ฌ ์ถœ๋ ฅ์— ํฌํ•จ์‹œ์ผœ ์ค๋‹ˆ๋‹ค. ์ด๋Š” PyTorch์˜ ํ•™์Šต ๋ฃจํ”„๋ฅผ ๋งค์šฐ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค.
    • loss.backward()๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์—ญ์ „ํŒŒ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
# labels๋ฅผ ํ•จ๊ป˜ ์ „๋‹ฌํ•˜์—ฌ loss ์ž๋™ ๊ณ„์‚ฐํ•˜๊ธฐ
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
labels = torch.tensor([1, 0]) # Positive, Negative

outputs = model(**inputs, labels=labels)
print(f"Logits: {outputs.logits}")
print(f"Loss: {outputs.loss}") # loss๊ฐ€ ํ•จ๊ป˜ ์ถœ๋ ฅ๋จ

# loss๋ฅผ ์ด์šฉํ•œ ์—ญ์ „ํŒŒ
loss = outputs.loss
loss.backward()

๋ชจ๋ธ ๋‚ด๋ถ€ ๋“ค์—ฌ๋‹ค๋ณด๊ธฐ

  • ๋ชจ๋ธ์„ ๋กœ๋“œํ•  ๋•Œ output_attentions=True, output_hidden_states=True ์ธ์ž๋ฅผ ์„ค์ •ํ•˜๋ฉด, ๋ชจ๋ธ์˜ ๊ฐ ๋ ˆ์ด์–ด์—์„œ ๊ณ„์‚ฐ๋œ **์–ดํ…์…˜ ๊ฐ€์ค‘์น˜(attention weights)**์™€ **์€๋‹‰ ์ƒํƒœ(hidden states)**๋ฅผ ์ถœ๋ ฅ์œผ๋กœ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด๋Š” ๋ชจ๋ธ์ด ๋ฌธ์žฅ์˜ ์–ด๋–ค ๋ถ€๋ถ„์— ์ง‘์ค‘ํ•˜๋Š”์ง€, ๊ฐ ๋ ˆ์ด์–ด๋ฅผ ๊ฑฐ์น˜๋ฉฐ ์ •๋ณด๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ณ€ํ™˜๋˜๋Š”์ง€๋ฅผ ๋ถ„์„ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•˜๋ฉฐ, ๋ชจ๋ธ์˜ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์—ฌ์ค๋‹ˆ๋‹ค.
# ์–ดํ…์…˜ ๊ฐ€์ค‘์น˜์™€ ์€๋‹‰ ์ƒํƒœ ์ถœ๋ ฅํ•˜๋„๋ก ๋ชจ๋ธ ๋กœ๋“œ
model = AutoModelForSequenceClassification.from_pretrained(
    checkpoint,
    output_attentions=True,
    output_hidden_states=True,
)

# ๋ชจ๋ธ ์‹คํ–‰
outputs = model(**inputs)

# ์ถœ๋ ฅ ํ™•์ธ (๋งค์šฐ ํฐ ํ…์„œ๋“ค์ด๋ฏ€๋กœ shape๋งŒ ํ™•์ธ)
print(f"์ฒซ ๋ฒˆ์งธ ์€๋‹‰ ์ƒํƒœ์˜ shape: {outputs.hidden_states[0].shape}")
print(f"์ฒซ ๋ฒˆ์งธ ์–ดํ…์…˜ ๊ฐ€์ค‘์น˜์˜ shape: {outputs.attentions[0].shape}")

์‹ฌํ™”: BERT์™€ GPT์˜ ์ฐจ์ด์ 

  • BERT (Bidirectional Encoder Representations from Transformers):
    • ๊ธฐ์ˆ ์  ๋ฐฐ๊ฒฝ: ์ธ์ฝ”๋” ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋ฌธ์žฅ์˜ ์–‘๋ฐฉํ–ฅ ๋ฌธ๋งฅ์„ ๋ชจ๋‘ ๊ณ ๋ คํ•˜์—ฌ ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ํŒŒ์•…ํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์Šคํฌ๋“œ ์–ธ์–ด ๋ชจ๋ธ(Masked Language Model, MLM) ๋ฐฉ์‹์„ ํ†ตํ•ด "๋‚˜๋Š” [MASK]์— ๊ฐ€์„œ ๋ฐฅ์„ ๋จน์—ˆ๋‹ค"์™€ ๊ฐ™์ด ๋ฌธ์žฅ ์ค‘๊ฐ„์˜ ๋นˆ์นธ์„ ๋งž์ถ”๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค.
    • ์ตœ์‹  ๋™ํ–ฅ: BERT ์ดํ›„๋กœ ๋ฌธ๋งฅ ์ดํ•ด ๋Šฅ๋ ฅ์„ ๊ฐœ์„ ํ•œ RoBERTa, ALBERT, ELECTRA ๋“ฑ ๋‹ค์–‘ํ•œ ๋ณ€ํ˜• ๋ชจ๋ธ์ด ๋“ฑ์žฅํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ํ•œ๊ณ„์ : ๋ณธ์งˆ์ ์œผ๋กœ ํ…์ŠคํŠธ ์ƒ์„ฑ(Generation) ์ž‘์—…์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค๋Š” ๋ช…ํ™•ํ•œ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • GPT (Generative Pre-trained Transformer):
    • ๊ธฐ์ˆ ์  ๋ฐฐ๊ฒฝ: ๋””์ฝ”๋” ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋ฌธ์žฅ์˜ ์™ผ์ชฝ์—์„œ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ด์–ด์ง€๋Š” ๋‹จ๋ฐฉํ–ฅ ๋ฌธ๋งฅ๋งŒ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. "๋‚˜๋Š” ํ•™๊ต์— ๊ฐ€์„œ" ๋‹ค์Œ์— ์˜ฌ ๋‹จ์–ด "๋ฐฅ์„"์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค.
    • ์ตœ์‹  ๋™ํ–ฅ: GPT-2, GPT-3๋ฅผ ๊ฑฐ์ณ ์ตœ๊ทผ์˜ GPT-4์— ์ด๋ฅด๊ธฐ๊นŒ์ง€ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ํ‚ค์šฐ๊ณ  ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์‹œ์ผœ ๋งค์šฐ ๋›ฐ์–ด๋‚œ ํ…์ŠคํŠธ ์ƒ์„ฑ ๋ฐ ๋Œ€ํ™” ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
    • ํ•œ๊ณ„์ : ์–‘๋ฐฉํ–ฅ ๋ฌธ๋งฅ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ๋ฌธ์žฅ ์ „์ฒด์˜ ๋ฏธ๋ฌ˜ํ•œ ์˜๋ฏธ๋ฅผ ํŒŒ์•…ํ•˜๋Š” ์ผ๋ถ€ NLP ์ž‘์—…์—์„œ๋Š” BERT ๊ณ„์—ด ๋ชจ๋ธ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋‚ฎ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

5. ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹(Fine-tuning) ์‹ค์ „

์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ํŠน์ • ์ž‘์—…๊ณผ ๋ฐ์ดํ„ฐ์…‹์— ๋งž๊ฒŒ ์ถ”๊ฐ€๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๊ณผ์ •์„ ํŒŒ์ธํŠœ๋‹์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ•์˜์—์„œ๋Š” IMDb ์˜ํ™” ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•œ ๊ฐ์„ฑ ๋ถ„์„์„ ์˜ˆ์‹œ๋กœ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์ค€๋น„

  • datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ load_dataset ํ•จ์ˆ˜๋กœ IMDb ๋ฐ์ดํ„ฐ์…‹์„ ์‰ฝ๊ฒŒ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
  • map ํ•จ์ˆ˜์™€ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์„ ํ•œ ๋ฒˆ์— ํ† ํฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ batched=True ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๋ฉด ์ฒ˜๋ฆฌ ์†๋„๊ฐ€ ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค.
  • ๋ถˆํ•„์š”ํ•œ ์ปฌ๋Ÿผ์€ ์ œ๊ฑฐํ•˜๊ณ , ๋ชจ๋ธ์ด ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋„๋ก label ์ปฌ๋Ÿผ์˜ ์ด๋ฆ„์„ labels๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.
  • set_format('torch')๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์…‹์„ PyTorch ํ…์„œ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • torch.utils.data.DataLoader๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ ๊ฒ€์ฆ์šฉ ๋ฐ์ดํ„ฐ ๋กœ๋”๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
from datasets import load_dataset

# 1. ๋ฐ์ดํ„ฐ์…‹ ๋กœ๋“œ (GLUE ๋ฒค์น˜๋งˆํฌ์˜ MRPC ๋ฐ์ดํ„ฐ์…‹ ์˜ˆ์‹œ)
raw_datasets = load_dataset("glue", "mrpc")

# 2. ํ† ํฐํ™” ํ•จ์ˆ˜ ์ •์˜
def tokenize_function(examples):
    return tokenizer(examples["sentence1"], examples["sentence2"], truncation=True)

# 3. map ํ•จ์ˆ˜๋กœ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ํ† ํฐํ™” ์ ์šฉ
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

ํŒŒ์ธํŠœ๋‹ ๋ฐฉ๋ฒ• 1: ์ˆ˜๋™ PyTorch ํ•™์Šต ๋ฃจํ”„

  • ์˜ตํ‹ฐ๋งˆ์ด์ €(AdamW)์™€ ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„๋Ÿฌ(get_scheduler)๋ฅผ transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ์ง์ ‘ ๊ฐ€์ ธ์™€ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
  • ์ผ๋ฐ˜์ ์ธ PyTorch ์ฝ”๋“œ์™€ ๋™์ผํ•˜๊ฒŒ, ์—ํญ(epoch)๊ณผ ๋ฐฐ์น˜(batch)๋ฅผ ์ˆœํšŒํ•˜๋Š” ํ•™์Šต ๋ฃจํ”„๋ฅผ ์ง์ ‘ ๊ตฌํ˜„ํ•˜์—ฌ ๋ชจ๋ธ์„ ํŒŒ์ธํŠœ๋‹ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํŒŒ์ธํŠœ๋‹ ๋ฐฉ๋ฒ• 2: Trainer ํด๋ž˜์Šค ํ™œ์šฉ

  • Hugging Face๋Š” ํŒŒ์ธํŠœ๋‹ ๊ณผ์ •์„ ๋งค์šฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ค์–ด์ฃผ๋Š” Trainer API๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • TrainingArguments: ํ•™์Šต๋ฅ , ๋ฐฐ์น˜ ํฌ๊ธฐ, ์—ํญ ์ˆ˜, ๋กœ๊ทธ ์ €์žฅ ๊ฒฝ๋กœ ๋“ฑ ํ•™์Šต์— ํ•„์š”ํ•œ ๋ชจ๋“  ์„ค์ •์„ ์ •์˜ํ•˜๋Š” ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค.
  • Trainer: ๋ชจ๋ธ, ํ•™์Šต ์„ค์ •, ๋ฐ์ดํ„ฐ์…‹, ํ† ํฌ๋‚˜์ด์ €, ๊ทธ๋ฆฌ๊ณ  ์„ฑ๋Šฅ ํ‰๊ฐ€ ํ•จ์ˆ˜(compute_metrics)๋ฅผ ์ธ์ž๋กœ ๋ฐ›์•„ ํ•™์Šต์˜ ๋ชจ๋“  ๊ณผ์ •์„ ์ž๋™์œผ๋กœ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
  • trainer.train(): ์ด ํ•œ ์ค„์˜ ์ฝ”๋“œ๋กœ ํŒŒ์ธํŠœ๋‹์„ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • trainer.predict(): ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ์ฝœ๋ฐฑ(Callbacks): EarlyStoppingCallback๊ณผ ๊ฐ™์€ ์ฝœ๋ฐฑ์„ ์ถ”๊ฐ€ํ•˜์—ฌ, ๊ฒ€์ฆ ์„ฑ๋Šฅ์ด ๋” ์ด์ƒ ๊ฐœ์„ ๋˜์ง€ ์•Š์„ ๋•Œ ํ•™์Šต์„ ์กฐ๊ธฐ ์ข…๋ฃŒํ•˜๋Š” ๋“ฑ์˜ ๋ถ€๊ฐ€ ๊ธฐ๋Šฅ์„ ์‰ฝ๊ฒŒ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
from transformers import TrainingArguments, Trainer

# 1. ํ•™์Šต์— ํ•„์š”ํ•œ ์ธ์ž(argument)๋“ค์„ ์ •์˜
training_args = TrainingArguments(
    output_dir="my_awesome_model",        # ๊ฒฐ๊ณผ๋ฌผ์ด ์ €์žฅ๋  ๋””๋ ‰ํ† ๋ฆฌ
    evaluation_strategy="epoch",          # ๋งค epoch ๋งˆ๋‹ค ํ‰๊ฐ€ ์ง„ํ–‰
    num_train_epochs=3,                   # ์ด ํ•™์Šต epoch
    per_device_train_batch_size=16,       # training์šฉ batch size
    per_device_eval_batch_size=16,        # evaluation์šฉ batch size
)

# 2. Trainer ๊ฐ์ฒด ์ƒ์„ฑ
trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
)

# 3. ํ•™์Šต ์‹œ์ž‘
trainer.train()

๋ชจ๋ธ ์ €์žฅ ๋ฐ ๋กœ๋“œ

  • Trainer๋Š” ํ•™์Šต ๊ณผ์ •์—์„œ ์„ค์ •๋œ ๊ฒฝ๋กœ์— ๋ชจ๋ธ์˜ ์ฒดํฌํฌ์ธํŠธ(checkpoint)๋ฅผ ์ž๋™์œผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
  • AutoModel.from_pretrained()์— ์ด ์ฒดํฌํฌ์ธํŠธ ๊ฒฝ๋กœ๋ฅผ ์ „๋‹ฌํ•˜๋ฉด, ํŒŒ์ธํŠœ๋‹๋œ ๋ชจ๋ธ์„ ๋‚˜์ค‘์— ๋‹ค์‹œ ๋ถˆ๋Ÿฌ์™€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
# ํ•™์Šต์ด ์™„๋ฃŒ๋œ ๋ชจ๋ธ์„ ์ €์žฅ
trainer.save_model("my_final_model")

# ์ €์žฅ๋œ ๋ชจ๋ธ์„ ๋‹ค์‹œ ๋กœ๋“œํ•˜๊ธฐ
from transformers import AutoModelForSequenceClassification

loaded_model = AutoModelForSequenceClassification.from_pretrained("my_final_model")
profile
AI ๊ณต๋ถ€ํ•ฉ๋‹ˆ๋‹ค

0๊ฐœ์˜ ๋Œ“๊ธ€