๐ŸŽฒ[AI] ํ–‰๋ ฌ๊ณฑ๊ณผ Linear layer

manduยท2025๋…„ 4์›” 27์ผ

[AI]

๋ชฉ๋ก ๋ณด๊ธฐ
6/20

ํ•ด๋‹น ๊ธ€์€ FastCampus - '[skill-up] ์ฒ˜์Œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ์œ ์น˜์› ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ ,
์ถ”๊ฐ€ ํ•™์Šตํ•œ ๋‚ด์šฉ์„ ๋ง๋ถ™์—ฌ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

1. ํ–‰๋ ฌ ๊ณฑ (Matrix Multiplication)


1.1 ํ–‰๋ ฌ ๊ณฑ์ด๋ž€?

  • Inner Product (๋‚ด์ ) ๋˜๋Š” Dot Product (์ ๊ณฑ) ์ด๋ผ๊ณ ๋„ ๋ถˆ๋ฆผ
  • ๋”ฅ๋Ÿฌ๋‹์˜ ํ•ต์‹ฌ ์—ฐ์‚ฐ ์ค‘ ํ•˜๋‚˜
  • Matrix(+ vector, scalar) ์—ฐ์‚ฐ๊นŒ์ง€๋งŒ ์ •์˜๊ฐ€ ๋˜์–ด์žˆ๊ณ ,
    Tensor๋ผ๋ฆฌ์˜ ์—ฐ์‚ฐ์€ ์ด Matrix ์—ฐ์‚ฐ์„ ํ™œ์šฉํ•ด์„œ ๊ณ„์‚ฐ

  • ๊ฐ Matrix์˜ ์ž…๋ ฅ, ์ถœ๋ ฅ ์ฐจ์›์˜ shape์„ ์•Œ์•„์•ผ ์™œ ์—๋Ÿฌ๊ฐ€ ๋‚ฌ๋Š”์ง€ ๋””๋ฒ„๊น… ๊ฐ€๋Šฅ
  • PyTorch ์ฝ”๋“œ:
z = torch.matmul(x, y)

# x*y๋Š” elemetalwise ๊ณฑ์…ˆ์ด๋‹ค!

1.2 ๋ฒกํ„ฐ-ํ–‰๋ ฌ ๊ณฑ์…ˆ (Vector-Matrix Multiplication)

  • ์ž…๋ ฅ, ์ถœ๋ ฅ ์ฐจ์› ๋งž์ถฐ์ฃผ๋Š” ์›๋ฆฌ๋Š” ๋™์ผํ•จ


1.3 ๋ฐฐ์น˜ ํ–‰๋ ฌ ๊ณฑ์…ˆ (Batch Matrix Multiplication, BMM)

  • ํ…์„œ ๊ณฑ์ด๋ผ๋Š” ๊ฑด ์ •์˜๋˜์–ด ์žˆ์ง€ ์•Š์Œ โ†’ ํ…์„œ๋ฅผ ์—ฌ๋Ÿฌ๊ฐœ์˜ ํ–‰๋ ฌ ์Œ์ด๋ผ๊ณ  ์ƒ๊ฐ
  • ๊ฐ™์€ ๊ฐฏ์ˆ˜์˜ ํ–‰๋ ฌ ์Œ๋“ค์— ๋Œ€ํ•ด์„œ ๋ณ‘๋ ฌ๋กœ ํ–‰๋ ฌ ๊ณฑ ์‹คํ–‰
  • (n, h) X (h, m) ์ด๋‹ˆ๊นŒ h๊ฐ€ ๊ฐ™์•„์•ผ ํ•จ
  • N ๋˜ํ•œ ๊ฐ™์•„์•ผ ํ•จ ์ฆ‰, ๋ณ‘๋ ฌ ๊ณฑ์„ N๋ฒˆ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ
  • 4์ฐจ์›์ด๋ผ๋ฉด ? 3์ฐจ์›๊ณผ ์‚ฌ์‹ค ๋™์ผํ•˜๊ฒŒ ๊ฐ„์ฃผ ๊ฐ€๋Šฅ
    (N1, N2, n, h) x (N1, N2, h, m) == (N1 โˆ—* N2, n, h) x (N1 โˆ—* N2, h, m)
  • ์ด๋Ÿฌํ•œ ๋ณ‘๋ ฌ ์—ฐ์‚ฐ์ด ๋งค์šฐ ๋งŽ์•„์ง€๋ฉด์„œ GPU์˜ ์ง„๊ฐ€๊ฐ€ ๋ฐœํœ˜๋˜๋Š” ๊ฒƒ

2. Linear Layer (์„ ํ˜• ๊ณ„์ธต)


2.1 Linear Layer๋ž€?

  • ๋”ฅ๋Ÿฌ๋‹ ์‹ ๊ฒฝ๋ง์—์„œ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ๊ตฌ์„ฑ ์š”์†Œ๋กœ, ๋‚ด๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ(๊ฐ€์ค‘์น˜ W์™€ ํŽธํ–ฅ b)์— ๋”ฐ๋ผ ์„ ํ˜• ๋ณ€ํ™˜์„ ์ˆ˜ํ–‰ํ•˜๋Š” ํ•จ์ˆ˜
  • ์ฆ‰, ๊ฐ€์ƒ์˜ ํ•จ์ˆ˜ f*๋ฅผ ๋ชจ์‚ฌํ•˜๊ธฐ ์œ„ํ•œ ๊ตฌ์„ฑ ์š”์†Œ!
  • ๋ชจ๋“  ์ž…๋ ฅ์˜ ๋…ธ๋“œ๋Š” ๋ชจ๋“  ์ถœ๋ ฅ์˜ ๋…ธ๋“œ์™€ ์—ฐ๊ฒฐ์ด ๋˜์–ด์žˆ๊ธฐ ๋•Œ๋ฌธ์—,
    Fully Connected (FC) Layer๋ผ๊ณ ๋„ ๋ถˆ๋ฆผ
  • ์›ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ฑ‰๋„๋ก ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์„ ์ฐพ๋Š” ๊ฒŒ ๋ฐ”๋กœ ํ•™์Šต

๋…ธ๋“œ == ํผ์…‰ํŠธ๋ก  == ๋‰ด๋Ÿฐ


2.2 ์ž‘๋™ ๋ฐฉ์‹

  • ๊ฐ ์ž…๋ ฅ ๋…ธ๋“œ๋“ค์— weight(๊ฐ€์ค‘์น˜)๋ฅผ ๊ณฑํ•˜๊ณ  ๋ชจ๋‘ ํ•ฉ์นœ ๋’ค, bias(ํŽธํ–ฅ)์„ ๋”ํ•จ
    โ†’ ํ–‰๋ ฌ ๊ณฑ์œผ๋กœ ๊ตฌํ˜„ ๊ฐ€๋Šฅ

  • n์ฐจ์›์—์„œ m ์ฐจ์›์œผ๋กœ์˜ ์„ ํ˜• ๋ณ€ํ™˜ ํ•จ์ˆ˜


2.3 ์ˆ˜์‹ ํ‘œํ˜„ ๋ฐฉ๋ฒ•

  • x๋ฅผ ๋ฏธ๋‹ˆ๋ฐฐ์น˜์— ๊ด€๊ณ„ ์—†์ด ๋‹จ์ˆœํžˆ ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๋กœ ๋ณผ ๊ฒฝ์šฐ

  • x๋ฅผ ๋ฏธ๋‹ˆ๋ฐฐ์น˜(N๊ฐœ) ํ…์„œ๋กœ ํ‘œํ˜„ํ•  ๊ฒฝ์šฐ (์ฃผ๋กœ ์‚ฌ์šฉ)
    โ†’ ๋ฏธ๋‹ˆ๋ฐฐ์น˜์— ๋Œ€ํ•œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅ


2.4 Pytorch ์ฝ”๋“œ

  • ๋‹จ์ˆœ Linear layer ๋งŒ๋“ค์–ด ๋ณด๊ธฐ
W = torch.FloatTensor([[1, 2],
                       [3, 4],
                       [5, 6]])
b = torch.FloatTensor([2, 2])
x = torch.FloatTensor([[1, 1, 1],
                       [2, 2, 2],
                       [3, 3, 3],
                       [4, 4, 4]])

print(W.size(), b.size(), x.size())

def linear(x, W, b):
    y = torch.matmul(x, W) + b
    
    return y

y = linear(x, W, b)

print(y.size())
  • ํ•™์Šต์ด ๊ฐ€๋Šฅํ•œ Linear layer ๋งŒ๋“ค์–ด ๋ณด๊ธฐ

nn.Module

  • PyTorch์—์„œ ๋ชจ๋ธ์„ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•œ base ํด๋ž˜์Šค (์ถ”์ƒ ํด๋ž˜์Šค๋ผ๊ณ  ํ•˜๊ธฐ์—” init๊ณผ forward ํ•จ์ˆ˜๋ฅผ ์ œ์™ธํ•˜๊ณ ๋Š” ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ)
  • ํŒŒ๋ผ๋ฏธํ„ฐ ๊ด€๋ฆฌ, ํŒŒ์ดํ”„๋ผ์ธ ์ฒ˜๋ฆฌ, ์ž๋™ ๋ฏธ๋ถ„, ๋ชจ๋ธ ํ›ˆ๋ จ ๋ฐ ํ‰๊ฐ€ ๊ธฐ๋Šฅ์„ ์‰ฝ๊ฒŒ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์ƒ์† ๋ฐ›๋Š” ํด๋ž˜์Šค
  • init๊ณผ forward๋Š” override ํ•„์ˆ˜!

forward

  • ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์ˆœ์ „ํŒŒ (forward pass)๋ฅผ ์ •์˜ํ•˜๋Š” ๋ฉ”์„œ๋“œ
  • ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ต๊ณผ์‹œํ‚ค๋ฉฐ ์ถœ๋ ฅ ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ณผ์ •

backward

  • ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์—ญ์ „ํŒŒ (backward pass)๋ฅผ ์ •์˜ํ•˜๋Š” ๋ฉ”์„œ๋“œ
  • ์ถœ๋ ฅ๊ณผ ์‹ค์ œ ๊ฐ’์˜ ์ฐจ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ธฐ์šธ๊ธฐ(gradient)๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ๋ฅผ ์œ„ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณต
class MyLinear(nn.Module): ## nn.Module: __init__๊ณผ forward override ํ•„์ˆ˜!

    def __init__(self, input_dim=3, output_dim=2):
        self.input_dim = input_dim
        self.output_dim = output_dim
        
        super().__init__() #๋ถ€๋ชจ ํด๋ž˜์Šค ์ดˆ๊ธฐํ™”
        
        self.W = nn.Parameter(torch.FloatTensor(input_dim, output_dim)) # nn.Parameter๋กœ wrapping ํ•ด์ค˜์•ผ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋“ฑ๋ก์ด ๋˜๊ณ  ํ•™์Šต์ด ๊ฐ€๋Šฅํ•จ
        self.b = nn.Parameter(torch.FloatTensor(output_dim))

    # You should override 'forward' method to implement detail.
    # The input arguments and outputs can be designed as you wish.
    def forward(self, x):
        # |x| = (batch_size, input_dim)   # ์ด dimension ์ฃผ์„ ๋‹ค๋Š” ์Šต๊ด€์ด ๋งค์šฐ ์ข‹๋‹ค!
        y = torch.matmul(x, self.W) + self.b
        # |y| = (batch_size, input_dim) * (input_dim, output_dim)
        #     = (batch_size, output_dim)
        
        return y

linear = MyLinear(3, 2)

y = linear(x)

print(y.size())

for p in linear.parameters():
    print(p)

# Parameter containing:
# tensor([[-0.4768,  0.3792,  0.2139],
#         [-0.2055, -0.3338, -0.0495]], requires_grad=True)
# Parameter containing:
# tensor([-0.3775, -0.5481], requires_grad=True)
  • ์€ ์‚ฌ์‹ค nn.Linear ํด๋ž˜์Šค๊ฐ€ ์ด๋ฏธ ๋งŒ๋“ค์–ด์ ธ ์žˆ์Œ
linear = nn.Linear(3, 2)

y = linear(x) # forward๋„ ์ž๋™์œผ๋กœ ์ง„ํ–‰

#
#

3. Summary

ํ•ญ๋ชฉ์„ค๋ช…
ํ–‰๋ ฌ๊ณฑ๋”ฅ๋Ÿฌ๋‹์˜ ๊ธฐ๋ณธ ์—ฐ์‚ฐ. ์ž…๋ ฅ๊ณผ ๊ฐ€์ค‘์น˜์˜ ๋‚ด์ 
BMM์—ฌ๋Ÿฌ ํ–‰๋ ฌ ์Œ์„ ๋ณ‘๋ ฌ๋กœ ๊ณฑํ•˜๋Š” ์—ฐ์‚ฐ
Linear Layer์„ ํ˜•๋ณ€ํ™˜์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์‹ ๊ฒฝ๋ง์˜ ๊ธฐ๋ณธ ๊ตฌ์„ฑ
์ˆ˜์‹y = xW + b ํ˜•ํƒœ์˜ ์„ ํ˜•ํ•จ์ˆ˜
ํŒŒ๋ผ๋ฏธํ„ฐW (๊ฐ€์ค‘์น˜), b (ํŽธํ–ฅ)

4. ์‹ค์ „์—์„œ ์ค‘์š”ํ•œ ์ 

  • Linear Layer์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์ธ W์™€ b๋ฅผ ์ž˜ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ
  • ์ด ๊ณผ์ •์„ ํ†ตํ•ด ์‹ ๊ฒฝ๋ง์€ ์ฃผ์–ด์ง„ ์ž…๋ ฅ์— ๋Œ€ํ•ด ์›ํ•˜๋Š” ์ถœ๋ ฅ์„ ๋‚ด๋„๋ก ํ•™์Šต๋จ

profile
๋งŒ๋‘๋Š” ๋ชฉ๋ง๋ผ

0๊ฐœ์˜ ๋Œ“๊ธ€