효율적인 미세조정 라이브러리 - Unsloth

nebchi·2024년 4월 29일

LLM NLP

LLM

목록 보기

8/11

Unsloth

Unsloth는 훈련속도를 재정의하여 생산성을 향상 시켰고, 미세 조정 시 메모리 사용량을 최대 60% 감소시키지만, 정확도 손실은 0%로 미세 조정에 최적화된 라이브러리로 소개가 되었습니다.
그래서 오늘은 이 Unsloth로 가볍게 LLAMA를 미세조정하는 방법에 대해 알아보겠습니다.
자세한 내용은 허깅페이스 블로그 https://huggingface.co/blog/Andyrasika/finetune-unsloth-qlora를 참고하시면 됩니다.

Unsloth를 통한 미세조정

import torch
major_version, minor_version = torch.cuda.get_device_capability()

우선 튜토리얼에 나온 대로, torch.cuda를 통해 현재 cuda의 컴퓨팅 능력을 출력합니다.

Unsloth 라이브러리 설치

if major_version >= 8:
    # Install the Unsloth library for Ampere and Hopper architecture from GitHub
    !pip install "unsloth[colab_ampere] @ git+https://github.com/unslothai/unsloth.git" -q
else:
    # Install the Unsloth library for older GPUs from GitHub
    !pip install "unsloth[colab] @ git+https://github.com/unslothai/unsloth.git" -q
pass

그 후, 해당 버전에 맞는 Unsloth를 설치해주면 Unsloth 미세조정 환경 설정은 끝났습니다.

모델 로드 및 양자화 설정

from unsloth import FastLanguageModel

max_seq_length = 2048 

dtype = None 

# 양자화 비트 수
load_in_4bit = True  

# 모델 설정
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="beomi/Llama-2-KoEn-13B-v2",  
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    cache_dir='/data'
)

Unsloth 라이브러리를 통해 LLAMA2 모델과 4bit 양자화를 수행해주는 것입니다.
Model로는 유명한 이준범님의 LLAMA2를 사용하였습니다. 그 이유로는 현재 LLAMA2와 GEMMA의 경우 데이터셋의 95%는 영어이고, 나머지는 그 외 다양한 나라의 언어입니다.
그래서 한국어 토큰 수가 적어, 한국어를 제대로 토큰화 할 수 없기 때문에 LLM은 입력된 값을 이해하지 못하고, 학습이 잘 되지 않습니다.
이준범님께서 만든 Ko-Llama2는 한국어 학습을 할 수 있게 tokenizer에 한국어 토큰을 학습시켜 높은 한국어 이해능력을 보여 해당 모델을 사용하게 되었습니다.

Peft 파라미터 설정

model = FastLanguageModel.get_peft_model(
    model,

    r=16,  

    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj",],
   
    lora_alpha=32,
   

    lora_dropout=0.05, 

    bias="none",  

    use_gradient_checkpointing=True,
    

    max_seq_length=max_seq_length,
    
)

LoRA 파인튜닝을 하기 위해 파라미터를 설정하면 준비는 끝났습니다.

데이터 세트 채팅 플랫폼 적용

# 채팅 템플릿 설정
def formatting_prompts_func(examples):
    output_texts = []
    for i in range(len(example['instruction'])):
        messages = [
            {"role": "user",
             "content": "{}".format(example['instruction'][i])},
            {"role": "assistant",
             "content": "{}".format(example['output'][i])}
        ]
        # 해당 메서드를 통해 LLAMA의 모델 클래스에 맞는 기본 채팅 템플릿 적용
        chat_message = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
        output_texts.append(chat_message)

    return output_texts

pass

from datasets import load_dataset
dataset = load_dataset("nebchi/kor-resume", split="train")
dataset = dataset.map(formatting_prompts_func, batched=True,)
dataset

데이터 세트도 LLAMA2에 기본 모델에 맞는 채팅 템플릿을 적용해주면 됩니다.
채팅 템플릿은 Chat 및 instruct 모델에서 제일 중요하다고 볼 수 있습니다. 그 이유로는 우선 사전 학습된 언어 모델은 학습된 데이터셋의 입력에 최적화가 되어있는데, 이를 맞춰주지 않으면 모델은 입력값에 대해 이해하지 못하게 되어 할루시네이션이 발생할 수 있기 때문입니다.

TRL을 통한 SFT 파인튜닝 실행

import torch
from trl import SFTTrainer
# Import SFTTrainer from the TRL library

from transformers import TrainingArguments
# Import TrainingArguments from the Transformers library

trainer = SFTTrainer(
    # Initialize the SFTTrainer

    model=model,
    # Specify the model to be used

    train_dataset=dataset,
    # Specify the training dataset

    dataset_text_field="text",
    # Specify the text field in the dataset

    max_seq_length=max_seq_length,
    # Specify the maximum sequence length

    args=TrainingArguments(
        # Specify training arguments

        per_device_train_batch_size=2,
        # Specify the training batch size per device

        gradient_accumulation_steps=4,
        # Specify the number of steps for gradient accumulation

        warmup_steps=5,
        # Specify the number of warm-up steps

        max_steps=20,
        # Specify the maximum number of steps

        learning_rate=2e-4,
        # Specify the learning rate

        fp16=not torch.cuda.is_bf16_supported(),
        # Set whether to use 16-bit floating-point precision (fp16)

        bf16=torch.cuda.is_bf16_supported(),
        # Set whether to use Bfloat16

        logging_steps=1,
        # Specify the logging steps

        optim="adamw_8bit",
        # Specify the optimizer (here using 8-bit AdamW)

        weight_decay=0.01,
        # Specify the weight decay value

        lr_scheduler_type="linear",
        # Specify the type of learning rate scheduler (linear)

        seed=3407,
        # Specify the random seed

        output_dir="outputs",
        # Specify the output directory

    ),
)
trainer.train()

이렇게 trl 라이브러리를 통해 sft 방식의 미세조정을 수행하면 Unsloth를 통해 기존에 파인튜닝을 위해 사용되던 메모리에 비해 60% 덜 사용하여 효율적인 미세조정을 수행할 수 있습니다.
오늘은 이렇게 대표적인 미세조정 라이브러리인 Peft와 더불어 효율적인 미세조정 라이브러리 Unsloth에 대해 알아보았습니다.

nebchi

NLP Developer

이전 포스트

로컬에서 LLM을 실행할 수 있는 llama.cpp

다음 포스트

효율적인 미세조정 라이브러리 - Unsloth

LLM

Unsloth

로컬에서 LLM을 실행할 수 있는 llama.cpp

이제는 어렵지 않게 GGUF 변환하기

0개의 댓글

관련 채용 정보