[PyTorch] OOM 해결

MinI0123·2023년 3월 18일

부스트캠프 AI Tech 5기

목록 보기

11/20

Out of Memory는 어디서 발생했는지 알기 어렵고, 메모리의 이전 상황을 파악하기 어렵기 때문에 해결이 어렵다. PyTorch를 사용할 때 마주할 수 있는 Out Of Memory를 해결할 수 있는 방법을 정리한다.

GPUUtil

GPUUtil은 GPU의 상태를 보여주는 모듈이다. 이를 사용하여 iter마다 메모리가 늘어나는지 확인하면 memory leak이 발생하고 있는지 확인할 수 있다.

!pip install GPUtill

import GPUtill
GPUtill.showUtilization()

empty_cache

torch.cuda.empty_cache는 사용되지 않는 GPU상의 cache를 정리하는 함수이다. 사용하지 않는 GPU내의 cache를 정리하여 가용 메모리를 확보하는 방법이다.

Tensor 축적

total_loss = 0
for i in range(10000):
  optimizer.zero_grad()
  output = model(input)
  loss = criterion(output)
  loss.backward()
  optimizer.step()
  total_loss += loss

Tensor 변수는 GPU 상의 메모리를 사용한다. 따라서 위와 같은 코드를 작성하면 total_loss의 backward 호출 시를 위해 computational graph가 GPU 메모리에 쌓인다. 이는 결국 OOM으로 이어질 수 있다. 반복문 내부에서 tensor의 연산은 주의가 필요하다.

del

필요없어진 변수는 삭제가 필요하다. 파이썬은 loop(for, while)이 끝나도 내부 변수가 사라지지 않기 때문이다. del을 사용하여 삭제할 수 있다.

Batch size

OOM이 발생하였다면 batch size를 줄여가면서 가능한 batch size를 찾을 수 있다.

no_grad

Inference(학습 후 테스트)시점에서는 torch.no_grad()를 사용하는 것이 좋다. backward로 인해 쌓이는 메모리에서 자유로울 수 있기 때문이다.

torch.no_grad 클래스는 gradient 연산을 끌 때 사용하는 파이썬의 context-manager이다.

Context-Manager

사용할 수 있는 리소스는 제한적이기 때문에 사용한 리소스를 release 하는 것이 중요하다. Context-manager는 원하는 타이밍에 정확하게 리소스를 할당하고 release하는 방법을 제공한다.

가장 많이 사용하는 Context-manger는 with 구문이다. with구문을 사용하기 위해서는 __enter__와 __exit__를 정의해야 한다.

__enter__ : with 구문 진입 시점에 자동 호출되는 메소드

__exit__ : with 구문을 빠져나오기 직전 호출되는 메소드

정의

no_grad 클래스는 다음과 같이 정의되어 있다.

class no_grad(_DecoratorContextManager):
    def __init__(self) -> None:
        if not torch._jit_internal.is_scripting():
            super().__init__()
        self.prev = False

    def __enter__(self) -> None:
        self.prev = torch.is_grad_enabled()
        torch.set_grad_enabled(False)

    def __exit__(self, exc_type: Any, exc_value: Any, traceback: Any) -> None:
        torch.set_grad_enabled(self.prev)

with 구문 진입시 원래 grad mode를 prev에 저장하고 false(No-Grad mode)로 변경한다. Autograd engine을 꺼서 gradient를 자동으로 추적하지 않도록 하는 것이다. 이를 통해 gradient를 사용하지 않는 경우 메모리 사용량을 아낄 수 있다. with 구문에서 나올 때는 grad mode를 with 구문 진입 직전의 grad mode로 되돌려 놓는다.

Grad Modes

PyTorch의 Autograd 연산에 영향을 주는 모드를 Grad mode라고 한다. Grad Mode에는 3가지가 있다.

1. Default Mode(Grad Mode)
다른 모드로 설정하지 않았을 때 사용되는 모드로 requires_grad가 영향을 미치는 유일한 모드이다. 다른 두 모드에서는 requires_grad가 항상 false로 override된다.

2. No-Grad Mode
No-Grad 모드에서 연산은 모든 input의 require_grad=false인 것처럼 작동한다. 즉 backward graph에 연산이 기록되지 않는다. Autograd에 기록되지 않아도 되는 연산을 수행할 때 활성화시키면 된다. No-Grad 모드에서의 연산 결과는 이후 grad 모드에서 사용할 수 있다.

3. Inference Mode
Inference 모드는 No-Grad 모드의 극단적인 버전이다. No-Grad 모드와 같이 backward graph에 연산이 기록되지 않아 모델의 속도를 더 향상시킬 수 있다. 하지만 No-Grad 모드와는 달리 inference 모드에서 생성된 텐서는 이후 autograd에 의해 기록되는 연산에는 사용할 수 없다. (Inference모드에서 생성된 텐서의 requires_grad를 True로 변경하려고 하면 Error가 발생한다.)

사용방법

with 구문

x = torch.tensor([1.], requires_grad=True)
with torch.no_grad():
	y = x * 2
y.requires_grad  # Flase

decorater

@torch.no_grad()
def doubler(x):
    return x * 2
z = doubler(x)
z.requires_grad # False

class를 데코레이터로 사용하기
__call__함수를 다음과 같이 구현하면 데코레이터로 사용할 수 있다.

class DecoraterClass:
    def __init__(self, func):
        self.func = func
    def __call__(self):
        # 함수 시작
        self.func() #꾸며줄 함수
        # 함수 끝

no_grad는 _DecoratorContextManager를 상속받은 class이다. _DecoratorContextManager의 소스코드를 확인할 수 없어 확신할 수는 없지만, 데코레이터로 사용가능한 것은 _DecoratorContextManager의 __call__함수가 구현되어 있기 때문이지 않을까 추측한다.