[Modeling] Baseline Error

fla1512·2023년 7월 1일

Python Study

목록 보기

5/6

TypeError

Error: TypeError: init() got an unexpected keyword argument 'checkpoint_callback'
- 해결 방법: checkpoint_callback = False, ← 해당 부분 검색해서 # 표시 앞에 넣어서 없애버리기

기타

trainer.test(model,test_dataloaders=model.test_dataloader())
- 인자가 달라져서 발생하는 에러
- 해결 방법:
  https://pytorch-lightning.readthedocs.io/en/1.7.0/common/trainer.html#testing
  - 파이토치 공식 홈페이지 들어가서
  - 인자 받는 방법이 바뀌지 않았는지 확인 ! 버전에 따라서 받는 방식들이 종종 바뀜

RuntimeError

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x1 and 1024x100), RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x1 and 1024x100)
- 인풋 크기 안맞아서 모델에 잘못들어간 경우가 다수
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
- 참고: https://deep-learning-study.tistory.com/918
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
- y.data.sub_(1) # 텐서 안의 값들을 1씩 뺴줌 (ex.2 -> 1, 1 -> 0)
- -> 범위 안맞는다는 에러가 뜸:
RuntimeError: mat1 and mat2 must have the same dtype
- 에러 원인: float 64 이어서 에러 뜸
- 해결책
  - df['enc'] = df['enc'].apply(lambda arr: arr.astype(np.float32))
    df['curr_enc'] = df['curr_enc'].apply(lambda arr: arr.astype(np.float32))
RuntimeError: both arguments to matmul need to be at least 1D, but they are 0D and 2D
=> 모든 코드가 다 잘 돌아가고 trainer.fit(model, data_module)까지 잘되어서 마지막으로 test에 적용만 시키면 되는 상황이었는데 에러가 떴다

뜬 에러는, trainer.test(model, datamodule=data_module) 여기에서 생겼고
왜 형태가 다른거지? print 를 여러가지 방향으로 고민했다
에러가 뜬다는 코드를 가서 print를 직접 시켜보니 534번까지 torch.Size([4, 128]), torch.Size([4])가 나왔었는데 마지막에 갑자기 torch.Size([]), torch.Size([1, 128])가 나오며 에러가 뜬다.

torch.Size([4, 128])
Testing DataLoader 0:  99%|██████████████████████████████████████████████████████████▋| 532/535 [00:04<00:00, 115.74it/s]torch.Size([4])
====
torch.Size([4, 128])
torch.Size([4])
====
torch.Size([4, 128])
torch.Size([4])
====
torch.Size([4, 128])
torch.Size([4])
====
torch.Size([4, 128])
Testing DataLoader 0: 100%|██████████████████████████████████████████████████████████▊| 533/535 [00:04<00:00, 115.75it/s]torch.Size([4])
====
torch.Size([4, 128])
torch.Size([4])
====
torch.Size([4, 128])
torch.Size([4])
====
torch.Size([4, 128])
torch.Size([4])
====
torch.Size([4, 128])
Testing DataLoader 0: 100%|██████████████████████████████████████████████████████████▉| 534/535 [00:04<00:00, 115.77it/s]
torch.Size([])
====
torch.Size([1, 128])

이 값만 없애버린다면? 에러가 안뜰것이다 => 해결책은, 마지막 배치를 무시하는 것으로
DataLoader를 설정할 때 drop_last=True를 설정하는 것이다

test_dataloader = DataLoader(test_set, batch_size=batch_size, drop_last=True)

=> 이를 통해 에러 해결!
위 에러는 다른 경우에도 많이 나오는데 나는 이런 이유로 에러가 떴었었다..

py 파일에서의 구현

An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if name == 'main': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.

해결책:

if __name__ == '__main__':
    from mmengine.runner import Runner

모든 코드들을 여기 다음에 이어서 넣어준다

evaluate error

원래 test에서는 에러가 잘 안나는데 에러가 떴다

RuntimeError: Error(s) in loading state_dict for DataParallel: 
Missing key(s) in state_dict: "module.trans.in_block.0.weight".....

뒤에 파일들이 문제인줄 알았지만 결론은 "DataParallel" 때문
- GPU를 두 개이상 쓸 때 쓰는건데, 안그럴거니까 해당 부분 코드를 지우면, 잘 돌아간다
- 참고

OSError

File "../python3.8/site-packages/pandas/io/common.py", line 600, in check_parent_directory
raise OSError(rf"Cannot save file into a non-existent directory: '{parent}'")
OSError: Cannot save file into a non-existent directory: 'result/origin/2clf'
- 디렉토리가 없어서 생기는 에러
- https://gentlesark.tistory.com/90
- 해결 코드:
```
        ```
        python
        directory = f'./result/{which_data}/{sampling}/{n_clf}/'
                if not os.path.exists(directory):
                    os.makedirs(directory)
        
                df_result.to_csv(directory+f'STATENet_30_{fold}_f.csv')
        ```
        
```

ValueError

ValueError: expected sequence of length 6 at dim 1 (got 4)
- 데이터 인풋 차원이 다르다!
- 넘파이랑 리스트 이런거 안맞을 가능성 높음 ! 무조건 대조 ㄱㄱㄱ
ValueError: not enough values to unpack (expected 2, got 1)
- 배치에서 나오는 인풋의 shape이 문제일 가능성 높음
- https://discuss.huggingface.co/t/valueerror-not-enough-values-to-unpack-expected-2-got-1/3516
- squeeze, unsqueeze로 수정
- stack으로 형태 바꿔버리기
- 기존에는 user_feats["input_ids"], user_feats["attention_mask"], user_feats["token_type_ids"]가 모두 torch.size[128]로 나왔는데 (128, 32) 인 튜플 형태 (여기서 32는 배치사이즈)로 나와야한다고해서 바꾸어주었다

        for user_feats in batch:
            # 여기 계속 에러 떠서 수정
            user_feats["input_ids"] = torch.stack([user_feats["input_ids"]] *  len(batch), dim=0)
            user_feats["attention_mask"] = torch.stack([user_feats["attention_mask"]] * len(batch), dim=0)
            user_feats["token_type_ids"] = torch.stack([user_feats["token_type_ids"]] *  len(batch), dim=0)

시간 데이터 관련

datetime 관련 에러
- df['created_utc'] = df['created_utc'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S'))

GPU 설정

하나만 꼭 쓰고 싶다면?

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Gpu 번호 설정 !!!!
os.environ["TOKENIZERS_PARALLELISM"] = "false"

fla1512

이전 포스트