[LLM] IBM granite-3.0-8b-instruct FineTuning

건희·2024년 11월 1일

granite-3.0-8b

IBM에서 개발한 다국어 언어 QnA 모델

finetuning 과정

해당 포스트는 다음 글을 참고하여 작성하였다.
https://www.ibm.com/granite/docs/how-to/fine-tuning/granite/

포스트를 참고하여 fine tuning 진행 도중, 모델 학습 후 저장하고 불러오는 과정에서 문제가 발생하여 그 부분을 함께 다룬다.

0. 필요 라이브러리 설치

!pip install "transformers>4.45.2" datasets accelerate bitsandbytes peft trl

1. Dataset preparation

데이터셋은 아래 링크의 데이터셋을 사용하였다.
https://huggingface.co/datasets/Ammad1Ali/Korean-conversational-dataset

granite 3.0 모델은 학습 데이터의 형식이 사용자의 질문에 해당하는 prompt, answer에 해당하는 response로 이루어져 있어, 해당 데이터셋의 [INST]와 [/INST]를 기준으로 prompt, response로 변환하여 korean-conversation.csv로 파일 생성 후 사용하였다.

from datasets import load_dataset

# load local csv file 
dataset = load_dataset('csv', data_files='korean-conversation.csv')
dataset = dataset['train'].train_test_split(test_size=0.2)

# check data
print(dataset['train'][0])

2. 모델 로드 및 양자화

낮은 GPU 성능으로도 fine tuning이 가능하도록 양자화를 진행한다. (peft 라이브러리가 도와주는 것 같다. 추후 자세히 다루어 볼 예정)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, BitsAndBytesConfig
from peft import LoraConfig
from trl import SFTTrainer

# hugging face 에서 미리 로컬에 다운받아놓은 IBM granite-3.0 original model
model_checkpoint = "./granite-3.0-8b-instruct" 
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_use_double_quant = True,
    bnb_4bit_compute_dtype = torch.float16 # 미설정 시 느린 속도로 경고가 뜸
)


model = AutoModelForCausalLM.from_pretrained(
    model_checkpoint,
    quantization_config = bnb_config,
    device_map = "auto"
)

3. Model sanity check

모델 세부 튜닝 전, 로드된 모델의 간단한 검증을 수행한다. 결과를 확인한 후 fine tuning 후 똑같이 시도해보고 모델의 출력을 관찰한다.

input_text = "<|user>오늘 날씨 좋다\n<|assistant|>\n"
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

4. Training setup

이 섹션에서는 훈련을 위한 환경을 준비한다.

총 세가지이다.
1. 모델로부터 기대하는 출력에 대한 학습 prompt를 정의한다. (formatting_prompts_func)
2. qLoRA 적용.
3. SFTTrainer 셋업

def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['prompt'])):
        text = f"<|system|>\n너는 매우 유능한 어시스턴트야. 한국어로 대답해주어야 해. 문장의 앞에 너가 인공지능이어서 못해드립니다 같은 말은 빼줘\n<|user|>\n{example['prompt'][i]}\n<|assistant|>\n{example['response'][i]}<|endoftext|>"
        output_texts.append(text)
    return output_texts

response_template = "\n<|assistant|>\n"

from trl import DataCollatorForCompletionOnlyLM

response_template_ids = tokenizer.encode(response_template, add_special_tokens=False)[2:]
collator = DataCollatorForCompletionOnlyLM(response_template_ids, tokenizer=tokenizer)


# Apply qLoRA
qlora_config = LoraConfig(
    r=16,  # The rank of the Low-Rank Adaptation
    lora_alpha=32,  # Scaling factor for the adapted layers
    target_modules=["q_proj", "v_proj"],  # Layer names to apply LoRA to
    lora_dropout=0.1,
    bias="none"
)

# Initialize the SFTTrainer
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-4,
    per_device_train_batch_size=6,
    per_device_eval_batch_size=6,
    num_train_epochs=3,
    logging_steps=100,
    fp16=True,
    report_to="none"
)

max_seq_length = 250

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    tokenizer=tokenizer,
    peft_config = qlora_config,
    formatting_func=formatting_prompts_func,
    data_collator=collator,
    max_seq_length=max_seq_length,

5. Training process

이전 단계에서 생성한 trainer로 모델의 학습을 수행한다.
trainer.save_model()은 양자화를 이용한 fine-tuning 환경에서 반드시 해주어야 한다.
(IBM 문서대로 모델 불러왔더니 아래 경고문자로 🐕고생했다.)

Some weights of the model checkpoint at ./granite-3.0-8b-instruct-finetuned-epoch3 were not used when initializing GraniteForCausalLM:

-> 모델 출력

<|user>한국의 수도는 어디야?
<|assistant|>
.0.. (. ( ( (... (Provided. (<<<<... ( ( (<<<< (0......... (00 ( ( ( ( ( (0. (<<<<. (<<<< ( of
0. (000 (000 ( (<<<<. (0 (++). (. (0 (0 (2.++). (40.00.0 (3. ( ( (0. ( means.30 (Provided (. (0 ( (.0.00000 ( ( (las.2.5 ( (.ë.et.3 (0 ( ( (0 ( ( _. E.3 (00 ( ( ( (4 (00. ( ( ( ( (00 ( ( (.2. by ( ( ( ( ( ( ( (.. ( (0. (00 ( ( (0 ( ( ( of ( (0 (0 (0 ( (00 ( ( ( ( ( ( ( ( (0 ( ( ( ( ( ( (.uf.

이유를 알아내고싶다.. 찾아보니 저런 경고는 무시해도 된다던데..
일단 해결하는 방법을 어찌저찌 찾아내었고 그 방법의 첫번째로는 trainer를 저장해주는 것이다.
두번째는 모델 불러올 때 설명하도록 하겠다.

trainer.train()

# 중요 : trainer를 저장해야 모델을 불러올 때 PeftModel.from_pretrained()로 qLoRA 설정을 불러올 수 있다.
trainer.save_model("./granite-3.0-8b-instruct-finetuned-epoch3")

6. Saving the fine-tuned model

model.save_pretrained("./granite-3.0-8b-instruct-finetuned-epoch3")
tokenizer.save_pretrained("./granite-3.0-8b-instruct-finetuned-epoch3")

7. 3번단계에서와 결과 비교

모델 finetuning 을 마쳤다면, 3번에서의 '오늘 날씨 좋다'에 대한 출력과 finetuning 후의 출력을 살펴볼 차례이다. 코드는 동일하다.

input_text = "<|user>오늘 날씨 좋다\n<|assistant|>\n"
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

finetuning 모델 사용

finetuning 과정의 '5. Training process'에서 trainer를 저장했고, '6. Saving the fine-tuned model'에서 model과 tokenizer를 저장했다.

finetuning 모델을 사용할 때는 다음의 단계를 거쳐 모델을 로드한다.

AutoTokenizer로 pretrained tokenizer 불러오기
finetuning 시 양자화 설정과 동일하게 BitsAndBytesConfig로 bnb_config 생성
AutoModelForCausalLM 으로 pretrained model 불러오기 (parameter - bnb_config 필수)
모델 eval()모드 설정 (가중치 업데이트 off)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
from peft import PeftModel, PeftConfig


device = "cuda" if torch.cuda.is_available() else "cpu"
model_path = "./granite-3.0-8b-instruct-finetuned-epoch1"

tokenizer = AutoTokenizer.from_pretrained(model_path)

bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_use_double_quant = True,
    bnb_4bit_compute_dtype = torch.float16 # 미설정 시 느린 속도로 경고가 뜸
)

model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=bnb_config, device_map="auto")
model = PeftModel.from_pretrained(model, './granite-3.0-8b-instruct-finetuned-epoch1')
model.eval()

이렇게 불러오면 경고문구 없이 잘 불러와진다!

모델 테스트

input_text = "<|user>너는 누구야?\n<|assistant|>\n"
input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
stop_token = "<|endoftext|>"
stop_token_id = tokenizer.encode(stop_token)[0]

outputs = model.generate(**input_tokens, max_new_tokens=250, eos_token_id=stop_token_id)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))