[PyTorch] class로 생성된 model의 layer, weights 접근법

olxtar·2022년 10월 5일

Comment :

Udacity DeepLearning - Project: Generate Faces을 진행하다가 network의 weight를 Normal Distribution(정규분포)로 바꿔주는 부분이 나왔다.

따라서 Python Class로 정의한 network(In Project, like Discriminator, Generator) 변수를 통해 접근해야 하는데 도무지 (Python Class로) 생성된 Instance, 즉 network의 각 layer에 접근하는 방법을 모르겠어서...! 아래와 같이 정리한다

먼저 아래와 같은 network가 있다고 해보자 (아직은 객체화, 즉 Instantiate되지 않은 상태)
~~코드가 다소 복잡하지만~~ 핵심구조는 아래와 같다.

layer1
- Sequential
  - Conv2d 🔥
layer2
- Sequential
  - Conv2d 🔥
  - BatchNorm2d
layer3
- Sequential
  - Conv2d 🔥
  - BatchNorm2d
fc
- Linear 🔥

class SampleNet(nn.Module):
    def __init__(self, conv_dim):
        super(SampleNet, self).__init__()
        
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=3,
                      out_channels=conv_dim,
                      kernel_size=4,
                      stride=1,
                      padding=1,
                      bias=False)
        )
            
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=conv_dim,
                      out_channels=conv_dim*2,
                      kernel_size=4,
                      stride=1,
                      padding=1,
                      bias=False),
            nn.BatchNorm2d(conv_dim*2)
        )
            
        self.layer3 = nn.Sequential(
            nn.Conv2d(in_channels=conv_dim*2,
                      out_channels=conv_dim*4,
                      kernel_size=4,
                      stride=1,
                      padding=1,
                      bias=False),
            nn.BatchNorm2d(conv_dim*4)
        )
            
        self.fc = nn.Linear(4*4*conv_dim*4, 1)

    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = F.relu(self.layer3(x))
            
        # Flatten
        x = x.view(-1, self.conv_dim*4*4*4)
        out = self.fc(x)
            
        return out

또한 목표는 🔥으로 표시한 layer의 weight를 normal distribution(mean=0, std=0.02)으로 변환하는 것이다 $\rightarrow$ Conv2d and Linear layer

00. Instantiate

먼저 아래와 같이 t 변수에 network를 instantiate(객체화)해주자

t = SampleNet(conv_dim = 64)

SampleNet(
  (layer1): Sequential(
    (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
  )
  (layer2): Sequential(
    (0): Conv2d(64, 128, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (layer3): Sequential(
    (0): Conv2d(128, 256, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (fc): Linear(in_features=4096, out_features=1, bias=True)
)

01. named_children()

1-0. named_children() 사용법

Instance t에 .named_children()를 실행시키면 (아래 코드와 같이) Generator 객체가 생성된다.

t.named_children()
>>>

<generator object Module.named_children at 0x000002699AE9A350>

[!] generator object에 대해서는 나중에 정리하고, 일단은 for loop를 통해 generator를 이용해보자

for name, child in t.named_children():  # named_children() generator를 통하여 각 name, child 반환가능
    print(f"이름: {name}")
    print(child)
    print(type(child))
    print()

이름: layer1
Sequential(
  (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
)
<class 'torch.nn.modules.container.Sequential'>

이름: layer2
Sequential(
  (0): Conv2d(64, 128, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
  (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
<class 'torch.nn.modules.container.Sequential'>

이름: layer3
Sequential(
  (0): Conv2d(128, 256, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
  (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
<class 'torch.nn.modules.container.Sequential'>

이름: fc
Linear(in_features=4096, out_features=1, bias=True)
<class 'torch.nn.modules.linear.Linear'>

[+] module : n. 모듈(컴퓨터, 특정 기능을 하는 시스템이나 프로그램의 단위), 조립 부품
🚀 .named_children() generator를 통해서 각 Layer의 이름(사용자 정의 이름)과 Child class(하위 클래스)을 반환

[!] Sequential로 구성된 부분은 '통'으로 인지하고 있음 (속에 Convolutional, BatchNorm layer있음)
[!] 따라서 아래와 같이 child class에도 .named_children()으로 접근하여 (child class 하위 child인) class에도 접근가능
[!] name은 언제나 사용자 정의 이름인거를 체크하자!

for name, child in t.named_children():
    print(f"이름: {name}")
    print(child)
    
    # module이 Sequential일 경우 하위 module도 출력
    if isinstance(child, nn.Sequential):
        for sub_name, sub_child in child.named_children():
            print(f"이름: {sub_name}")
            print(sub_child)
            
    print()

이름: layer1
Sequential((0): Conv2d(3, 64, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False))
이름: 0
Conv2d(3, 64, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)


이름: layer2
Sequential((0): Conv2d(64, 128, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
  		   (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
이름: 0
Conv2d(64, 128, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
이름: 1
BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)


이름: layer3
Sequential((0): Conv2d(128, 256, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
           (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
이름: 0
Conv2d(128, 256, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
이름: 1
BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)


이름: fc
Linear(in_features=4096, out_features=1, bias=True)

1-1. named_children()을 통한 weight 변환(1)

자 이제 named_children()을 통하여 Linear layer( $\simeq$ sub, child class)에 접근하여 weight를 변경해보자

for _, child in t.named_children():
    if isinstance(child, nn.Linear):
        print(f"변경 전 weight: {child.weight}")
        
        child.weight.data.fill_(7)
        print(f"변경 후 weight: {child.weight}")

변경 전 weight: Parameter containing:
tensor([[ 0.0002, -0.0105,  0.0028,  ...,  0.0121, -0.0053, -0.0080]],
       requires_grad=True)
변경 후 weight: Parameter containing:
tensor([[7., 7., 7.,  ..., 7., 7., 7.]], requires_grad=True)

먼저 named_children() generator + for loop를 통해서 모든 layer를 돈다.
isinstance를 통해 해당 layer가 nn.Linear일 경우
1. 변경 전 weight를 출력
2. weight를 모두 7로 변경
3. 변경 후 weight를 출력

1-2. named_children()을 통한 weight 변환(2)

이번에는 여러개의 Convolutional layer(Sequential의 하위클래스, 여러개있음)에 접근하여 weight를 변경해보자

[!] weight의 차원이 너무 커서 출력 시 보기힘들어서 [0,0,0]부분만 인덱싱하여 출력함

for _, child in t.named_children():
    if isinstance(child, nn.Sequential):   # Sequential 밑으로 접근
    
        for _, sub_child in child.named_children():
            if isinstance(sub_child, nn.Conv2d):
                print(f"변경 전 weight: {sub_child.weight[0,0,0]}")
                sub_child.weight.data.fill_(741)
                print(f"변경 후 weight: {sub_child.weight[0,0,0]}")

변경 전 weight: tensor([ 0.0725,  0.0872,  0.0445, -0.0856], grad_fn=<SelectBackward0>)
변경 후 weight: tensor([741., 741., 741., 741.], grad_fn=<SelectBackward0>)

변경 전 weight: tensor([-0.0140,  0.0265, -0.0020,  0.0028], grad_fn=<SelectBackward0>)
변경 후 weight: tensor([741., 741., 741., 741.], grad_fn=<SelectBackward0>)

변경 전 weight: tensor([ 0.0049,  0.0044, -0.0071,  0.0044], grad_fn=<SelectBackward0>)
변경 후 weight: tensor([741., 741., 741., 741.], grad_fn=<SelectBackward0>)

1-3. named_children()을 통한 weight 변환(3)

~~귀찮지만 마지막으로~~ Convolutional, Linear layer에 접근하여 weight를 normal distribution(mean=0, std=0.02)로 변환, 즉 initiate(초기화)해보자
[+] normal distribution initiate : .normal_(mean, std)

[!] Convolutional layer의 weight의 차원이 너무 커서 출력 시 보기힘들어서 [0,0,0]부분만 인덱싱하여 출력함

for _, child in t.named_children():
    # Linear layer
    if isinstance(child, nn.Linear):
        print(f"weight before initiate: {child.weight}")
        child.weight.data.normal_(0.0, 0.02)
        print(f"weight after initiate: {child.weight}")
    
    # Sequential -> Conv layer
    if isinstance(child, nn.Sequential):
        for _, sub_child in child.named_children():
            if isinstance(sub_child, nn.Conv2d):
                print(f"weight before initiate: {sub_child.weight[0,0,0]}")
                sub_child.weight.data.normal_(0.0, 0.02)
                print(f"weight after initiate: {sub_child.weight[0,0,0]}")

weight before initiate: tensor([ 0.0821, -0.0893,  0.1116,  0.0880], grad_fn=<SelectBackward0>)
weight after initiate: tensor([0.0557, 0.0132, 0.0125, 0.0140], grad_fn=<SelectBackward0>)

weight before initiate: tensor([-0.0125, -0.0125, -0.0167,  0.0039], grad_fn=<SelectBackward0>)
weight after initiate: tensor([-0.0055,  0.0099, -0.0086,  0.0158], grad_fn=<SelectBackward0>)

weight before initiate: tensor([-0.0019,  0.0149,  0.0081,  0.0183], grad_fn=<SelectBackward0>)
weight after initiate: tensor([-0.0094, -0.0180, -0.0129,  0.0009], grad_fn=<SelectBackward0>)

weight before initiate: Parameter containing:
tensor([[ 0.0092, -0.0120, -0.0044,  ...,  0.0151, -0.0154, -0.0080]], requires_grad=True)
weight after initiate: Parameter containing:
tensor([[ 0.0344, -0.0263, -0.0140,  ...,  0.0181, -0.0210,  0.0099]], requires_grad=True)

02. _modules

_modules를 network에 적용시키면 (해당 network class들이 나열된) OrderedDict를 반환해준다 $\downarrow$

[+] collections.OrderedDict : 순서가 있는 Dictionary, 자세한 내용은 링크 참조

print(type(t._modules))
print(t._modules)

collections.OrderedDict     # t._modules의 type check!

OrderedDict([('layer1',
              Sequential(
                (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
              )),
             ('layer2',
              Sequential(
                (0): Conv2d(64, 128, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
                (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )),
             ('layer3',
              Sequential(
                (0): Conv2d(128, 256, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
                (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )),
             ('fc', Linear(in_features=4096, out_features=1, bias=True))])

위에서 보이는 각 layer들의 '이름'으로 (Python Dictionary 접근하듯이) 접근하면각 layer에 접근가능하다 $\downarrow$

t._modules['layer1']
>>>

Sequential((0): Conv2d(3, 64, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False))

t._modules['layer2']
>>>

Sequential((0): Conv2d(64, 128, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
  		   (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))

[!] Dictionary처럼 _modules['...']['...']로 접근하는것이 아니라 계속 _modules로 접근해줘야한다
이는 _modules로 접근 시 반환받는 것이 단순 value가 아니라 module이기 때문인것 같다
[!] Sequential module class!

'layer1'의 '0' 이름의 layer class를 바로 반환받음!
type(t._modules['layer1']._modules['0'])
>>>

torch.nn.modules.conv.Conv2d

t._modules['layer2']._moduels['1']
>>>

BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

2-1. _modules을 통한 weight 변환

(방금 위에서 말한듯이) Dictionary 접근 시 어떤 str같은 값을 반환하는 것이 아닌 해당 module class를 반환해주므로 바로 weight, bias등에 접근할 수 있다.
[!] generator + for loop의 방식이 아닌 Ordered Dictionary 반환이므로 특정 layer 1개에 접근하는 것이 편하다

print(t._modules['layer1']._modules['0'])
print(t._modules['layer1']._modules['0'].weight)
>>>

Conv2d(3, 64, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
Parameter containing:
tensor([[[[ 5.5747e-02,  1.3203e-02,  1.2492e-02,  1.4003e-02],
          ...
          [-1.3805e-04,  2.3481e-02,  1.0857e-02,  3.2397e-02]]]],
       requires_grad=True

이제 (named_children()으로 특정 layer의 weight를 초기화한것처럼) Initiate를 해보자
[+] 두번째 layer의 Convolutional layer의 weight만 normal distribution으로 initiate

print(f"Before Initiate: {t._modules['layer2']._modules['0'].weight[0,0,0]}")

t._modules['layer2']._modules['0'].weight.data.fill_(741)

print(f"After Initiate: {t._modules['layer2']._modules['0'].weight[0,0,0]}")
>>>

Before Initiate: tensor([-0.0100,  0.0191, -0.0037, -0.0068], grad_fn=<SelectBackward0>)
Before Initiate: tensor([741., 741., 741., 741.], grad_fn=<SelectBackward0>)

03. class.name + apply + find

Reference 1.

Udacity Project solution github 참고 여기

음... network는 하나의 Class에 여러개의 하위 Class가 있는 구조(?)
따라서 .apply(function)을 적용하면 하위 Class에 (순차적으로) function이 접근이 된다(?)

아래와 같은 function을 network에 apply하면 특정 layer(Convolutional, Linear)의 weight를 normali distribution으로 initiate할 수 있다.

from torch.nn import unit
def weights_init_normal(m):
	classname = m.__class__.__name__
    isConv = classname.find('Conv') != -1
    isLinear = classname.find('Linear') != -1
    
    if (hasattr(m, 'weight') and isConv or isLinear):
    	init.normal_(m.weight.data, 0.0, 0.02)

Reference 2.

Udacity Project solution github 참고 [여기](Udacity Project solution github 참고)

def weights_init_normal(m):
	classname = m.__class__.__name__
    
    if hasattr(m, 'weight') and (classname.find('Conv') != -1 or classname.find('Linear') != -1):
    	m.weight.data.normal_(0.0, 0.02)
        
        # The bias tern, if they exist, set to 0
        if hasattr(m, 'bias') and m.bias is not None:
        	m.bias.data.zero_()