[EDA/Python] Numpy! Numpy What and Why? ๐Ÿ“Š

SengMin Youn ์œค์„ฑ๋ฏผยท2023๋…„ 10์›” 21์ผ
2
post-thumbnail

Numpy๋ž€?

์š”์ฆ˜ 1๋…„๊ฐ„ ๊ฑฐ์˜ ๋งค์ผ ์ง„ํ–‰ํ•ด์˜จ ์ˆ˜ํ•™ ๊ณต๋ถ€๊ฐ€ ๊ฒฐ์‹ค์„ ๋งบ๊ณ  ์žˆ๋Š” ๊ฒƒ ๊ฐ™์•„ ๊ธฐ๋ถ„์ด ์ข‹๋‹ค. ๋จธ์‹ ๋Ÿฌ๋‹ ๊ณต๋ถ€๋ฅผ ์ตœ๊ทผ์— ๋ณธ๊ฒฉ์ ์œผ๋กœ ์‹œ์ž‘ํ•˜๋ฉด์„œ ์ˆ˜ํ•™ ๋•Œ๋ฌธ์— ๋ง‰ํžŒ ์ ์€ ํฌ๊ฒŒ ์—†๋Š” ๊ฒƒ ๊ฐ™๋‹ค (IQ๊ฐ€ ๋ช‡ ์  ๋ถ€์กฑํ•˜์—ฌ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ๋Š” ๋นˆ๋ฒˆํ•˜๋‹ค).

์•„๋ฌดํŠผ, Numpy๋ž€ 'multidimensional arrays'์— ๋Œ€ํ•œ ์—ฐ์‚ฐ์„ ์šฉ์ดํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‹ค. ๊ทธ๋ƒฅ ๊ธฐ๋ณธ ๋ฆฌ์ŠคํŠธ ํ˜น์€ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅด๋‹ค. ํŠนํžˆ 'gradient descent'๋ฅผ ์ƒ๊ฐํ•œ๋‹ค๋ฉด for loop์„ ๋Œ๋ ค parameter๋ฅผ ์—…๋ฐ์ดํŠธ ํ•ด์ฃผ๋Š” ๊ฒƒ๋ณด๋‹ค np.dot ํ˜น์€ np.matmul ๋“ฑ์˜ ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜๋ฉด ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ํ–‰๋ ฌ ์—ฐ์‚ฐ์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฐ ์–˜๊ธฐ๋Š” ์ถ”ํ›„ machine learning ๊ด€๋ จ ํฌ์ŠคํŒ…์—์„œ ๋” ์ž์„ธํžˆ ํ•˜๋„๋ก ํ•˜๊ฒ ๋‹ค.

Why Numpy?

'vectorization'์€ ๋จธ์‹ ๋Ÿฌ๋‹์— ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์‚ดํŽด๋ณด์ž.

import numpy
import time
 size = 1000000  

list1 = range(size)
list2 = range(size)
 
array1 = numpy.arange(size)  
array2 = numpy.arange(size)
 
initialTime = time.time()
resultantList = [(a * b) for a, b in zip(list1, list2)]
 
print("Time taken by Lists :", 
      (time.time() - initialTime),
      "seconds")
 
initialTime = time.time()
resultantArray = array1 * array2
 
print("Time taken by NumPy Arrays :",
      (time.time() - initialTime),
      "seconds")
> Time taken by Lists : 1.1984527111053467 seconds
  Time taken by NumPy Arrays : 0.13434123992919922 seconds

๋ฆฌ์ŠคํŠธ๋ฅผ 'vectorize'ํ•˜์—ฌ ํ–‰๋ ฌ์ฒ˜๋Ÿผ ๋Œ€ํ•˜๋ฉด ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ๊ฒฐ๊ณผ๋ฅผ ์‚ฐ์ถœํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ธฐ๋ณธ๋ฌธ๋ฒ•

  • ํ–‰๋ ฌ ์ƒ์„ฑ
B = np.array([[0, 1, 2, 3], 
              [4, 5, 6, 7], 
              [8, 9, 10, 11]])
  • B ๋ชจ์–‘ ํ™•์ธ
print(B.shape)
> (3, 4)
  • 3 X 4 '0' ํ–‰๋ ฌ ์ƒ์„ฑ, 3 x 3 Identity ํ–‰๋ ฌ ์ƒ์„ฑ
print(np.zeroes((3,4)))
print(np.eye(3))
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Indexing and Slicing

> Z= np.array([[0,1,2,3,4,5],
             [10,11,12,13,14,15],
             [20,21,22,23,24,25],
             [30,31,32,33,34,35],
             [40,41,42,43,44,45],
             [50,51,52,53,54,55]])

# Construct `Z_green`, `Z_red`, `Z_orange`, and `Z_cyan`:
Z_green = Z[(2,4), ::2]
Z_red = Z[:, 2]
Z_orange = Z[0, 3:5]
Z_cyan = Z[(4,5), 4:6]

ํฌ๊ฒŒ ์–ด๋ ค์šธ ๊ฑด ์—†๋‹ค. ๋ฆฌ์ŠคํŠธ ์ธ๋ฑ์‹ฑ๊ณผ ๋น„์Šทํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.
๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์„ ๊ณ ๋ คํ–ˆ์„๋•Œ Z_green ๋“ฑ์€ ๊ทธ๋ƒฅ 'view'์ด๋‹ค. Slicing์„ ํ•˜์—ฌ ๋ณ€์ˆ˜๋ฅผ ์„ ์–ธํ•œ๋‹ค๊ณ  ์ƒˆ๋กœ์šด ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์ด ํ• ๋‹น ๋˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ƒˆ๋กœ์šด ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด Z[:, 2].copy() ๋ฅผ ์„ ์–ธํ•˜๋ฉด ๋œ๋‹ค.

Indirect Addressing

'Boolean Mask' ๋˜๋Š” 'Indices'๋กœ ๊ตฌ์„ฑ๋œ array๋ฅผ ํ†ตํ•ด indxing์„ ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

from numpy.random import default_rng 
rng = default_rng(12345) 

x = rng.integers(0, 20, 15) 
print(x)
> [13 4 15 6 4 15 12 13 19 7 16 6 11 11 4]

inds = np.array([3, 7, 7, 12])
print(x[inds])
> [6 13 19 11]

mask_mult_3 = (x > 0) & (x % 3 ==0) 
print("x:", x)
print("mask_mult_3:", mask_mult_3)
print("==> x[mask_mult_3]:", x[mask_mult_3]) 
>x: [13 4 15 6 4 15 12 13 19 7 16 6 11 11 4]
>mask_mult_3: [False False  True  True False  True  True False False False False  True
 False False False]
>==> x[mask_mult_3]: [15 6 15 12 6]

์‘์šฉ

20๊นŒ์ง€์˜ ์†Œ์ˆ˜๋ฅผ ๋ชจ๋‘ ์ฐพ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ž‘์„ฑํ•ด๋ณด์ž. ์—๋ผํ† ์Šคํ…Œ๋„ค์Šค์˜ ์ฒด๋ฅผ numpy๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ์‚ฌ์‹ค ๋ถˆํ•„์š”ํ•˜๋ฉฐ ์ฝ”๋”ฉํ…Œ์ŠคํŠธ์—์„œ๋Š” ๊ทธ๋ƒฅ ๋ฆฌ์ŠคํŠธ๋ฅผ ํ™œ์šฉํ•  ๊ฒƒ ๊ฐ™๋‹ค.

from math import sqrt
def sieve(n):

    is_prime = np.empty(n+1, dtype=bool) # the "sieve"

    # Initial values
    is_prime[0:2] = False # {0, 1} are _not_ considered prime
    is_prime[2:] = True # All other values might be prime

    m = int(sqrt(n)) + 1
    
    for i in range(2, m):
        if is_prime[i] == True:
            for j in range(i+i, n+1, i):
                is_prime[j] = False 
       
    return is_prime

# Prints your primes
print("==> Primes through 20:\n", np.nonzero(sieve(20))[0])
>==> Primes through 20:  
 [2 3 5 7 11 13 17 19]
profile
An Aspiring Back-end Developer

0๊ฐœ์˜ ๋Œ“๊ธ€