[코드리뷰] Trainer.py

kaeul·2024년 10월 17일

class Trainer
""" Trainer is a simple but feature-complete training and eval loop for PyTorch, optimized for 🤗 Transformers.

[Trainer] is optimized to work with the [PreTrainedModel] provided by the library. You can still use your own models defined as torch.nn.Module as long as they work the same way as the 🤗 Transformers models. """

args, seed, deepspeed와 같은 기본 설정
self.args = args
enable_full_determinism(Self.args.seed) if self.args.full_determinism else set_seed(self.args.seed)
self.hp_name = None
self.deepspeed = None
self.is_in_train = False
accelerator 설정

accelerator, gradient_accumulation_kwargs(num_steps, gradient_accumulation_steps 체크
Accelerator 객체 생성 후 accelerator.state가 deepspeed인지, fsdp인지 확인 후 post accelerator 생성
1) fsdp
- limit_all_gathers
- fsdp_plugin.activation_checkpointing와 self.args.gradient_checkpointing은 같이 사용할 수 없음.
2) deepspeed
save_only_model은 DeepSpeed/FSDP를 wrapper로 사용하면 사용 불가.
auto_find_batch_size는 아직 DeepSpeed/FSDP에 지원 x

--> 추후 다시 정리
self.create_accelerator_and_postprocess()

memory 설정 (가능한 빨리 하는게 좋음) ##
self._memory_tracker = TrainerMemoryTracker(self.args.skip_memory_metrics)
self._memory_tracker.start()
node에 따른 올바른 log level 설정
log_level = args.get_process_log_level()
logging.set_verbosity(log_level)
force device and distributed setup init explicitly
args._setup_devices
model 설정

trainer는 model이나 model_init이 필요하다. 둘다 값이 존재하면 model_init이 model을 overwrite
model은 없지만, model_init이 있으면 self.call_model_init() 으로 모델을 설정해주는데, 개수가 0, 1 ## 확인 필요
model.class.name이 허깅페이스 AutoClass에서 지원해주는 MODEL_MAPPING_NAMES에 있는지 확인. ## model.class.name은 어떻게 하는거지??
모델 로드 후 병렬처리와 양자화 여부 확인
model에서 is_parallelizable와 model_parallel 인자로 prarallel 여부 확인 후 self.is_model_parallel 설정
model에 hf_device_map 확인 ## device의 개수가 1 보다 크면 패러랠??
devices = [device for device in set(model.hf_device_map.values()) if device not in ["cpu", "disk"]]
if len(devices) > 1:
self.is_model_parallel = True
elif len(devices) == 1: ##
self.is_model_parallel = self.args.device != torch.devie(devices[0])
else:
self.is_model_parallel = False
quantization 여부에 따라서
_is_quantized_and_base_model = getattr(model, "is_quantized", False) and not getattr(
model, "_hf_peft_config_loaded", False
)

양자화 + compiled 모델 필터링 ## 왜 양자화된 모델은 compile된거면 안되지?
if _is_quantized_and_base_model and hasattr(model, "_orig_mod"):
raise ValueError(
"You cannot fine-tune quantized model with torch.compile() make sure to pass a non-compiled model when fine-tuning a quantized model with PEFT"
)

모델을 device에 놓을지 gpu cuda에 옮기는걸 연기할지 여부 확인

# one place to sort out whether to place the model on device or not
# postpone switching model to cuda when:
# 1. MP - since we are trying to fit a much bigger than 1 gpu model
# 2. fp16-enabled DeepSpeed loads the model in half the size and it doesn't need .to() anyway,
#    and we only use deepspeed for training at the moment
# 3. full bf16 or fp16 eval - since the model needs to be cast to the right dtype first
# 4. FSDP - same as MP

병렬처리: self.place_model_on_device = False
default_collator = DataCollatorWithPadding(tokenizer)
self.data_collator = data_collator if data_collator is not None else default_data_collator
train_dataset, eval_dataset, tokenizer 세팅
Bnb 양자화된 모델은 .to 오퍼레이션 지원 안됨.
Model Parallel이 gpu를 관리하면 n_gpu를 1로 둬서 Data Parallel 피하기 ###
if self.is_model_parallel:
self.args._n_gpu=1
model_init은 optimizer와 함께 쓰일 수 없기 때문에 Trainer의 sibclass인 create_optimizer_and_scheduler에서 override 해야 한다.
모델은 디바이스에 올려 놓고 optimizer를 생성해야 하며, DeepSpeed와 FSDP가 활성화 되어 있으면 optimizer를 같이 사용할 수 없다.
callback 관련 설정
sagemaker에서 mixed precision 관련 -- 지금은 사용 안하나?
label smoothing관련 설정
self.control = TrainerControl()
self.state = TrainerState()
trainer_callback.py
class TrainerState:
""" A class containing the [Trainer] inner state that will be saved along the model and optimizer when checkpointing
and passed to the [TrainerCallback]."""
- one step을 의미하며, gradient_accumulation_steps=n을 했을 땐 n번의 배치에 대해 forward, backward가 일어남.
  
  https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L1530

kaeul

Deep learning

이전 포스트

extend context length

다음 포스트

[코드리뷰] Trainer.py

extend context length

[Transformers] modeling_utils.py

0개의 댓글