Numpy : Numerical Python

- 파이썬 고성능 과학 계산용 패키지

Numpy는 Vector, Matrix등 수치 연산을 수행하는 수치해석 및 통계 관련 작업에 사용되는 파이썬 패키지이다.

Numpy 특징.

일반 List에 비해 빠르고, 메모리 효율적이다.
for문이나 List comprehension을 사용하지 않고 배열에 대한 처리가 가능.
선형대수와 관련된 다양한 기능을 제공.
내부적으로 C또는 포트란으로 구현 되어있어 빠른 연산이 가능하다.

이 글은 네이버 부스트코스 강의 중 최성철교수님의 강의를 바탕으로 학습하며 작성했습니다.

1. import

import numpy as np

기본적으로 쓰이는 모듈이다 보니 np로 호출하는 것이 관례라고 함.

2. numpy array 생성

import numpy as np

test_array = np.array(["1", "4", 5, 8], float)	# float type array
# 위의 예시처럼 string 타입의 요소가 있어도 float로 자동 형변환 된다.

# test_array = np.array(["1", "4", 5, 8], np.float32)
# 보통 이렇게 크기를 정해서 사용.

python은 원래 실행 시점에 data의 type을 결정하는데, numpy에서는 그것을 허용하지 않음. (Dynamic typing을 지원하지 않는다.)
때문에 하나의 data type만 배열에 넣을 수 있고, C의 array를 사용하여 배열을 생성한다.

3. Shape

numpy array의 dimension(차원) 구성.

test_array = np.array(["1", "3", 5, 7, 9], float)
print(test_array)
print(test_array.shape)

[1. 3. 5. 7. 9.]
# tuple type return
(5,)

위의 경우는 vector shape이고,

test_array = np.array([["1", "3", 5, 7, 9]], float)
print(test_array)
print(test_array.shape)

[[1. 3. 5. 7. 9.]]
(1, 5)

이 경우에는 1행 5열의 Matrix shape이다.

만약 3차원 tensor의 depth가 3이라면,
그때의 shape는 (3, 1, 5)이다.
(dimension이 높아질수록shape의 요소가 하나씩 뒤로 밀리는 모습을 보임 (5,) -> (1, 5) -> (3, 1, 5))

즉,

test_array = np.array([[[1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9]],
			[[1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9]],
			[[1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9]]], float)
print(test_array)
print(test_array.shape)

[[[1. 3. 5. 7. 9.]
  [1. 3. 5. 7. 9.]
  [1. 3. 5. 7. 9.]
  [1. 3. 5. 7. 9.]]

 [[1. 3. 5. 7. 9.]
  [1. 3. 5. 7. 9.]
  [1. 3. 5. 7. 9.]
  [1. 3. 5. 7. 9.]]

 [[1. 3. 5. 7. 9.]
  [1. 3. 5. 7. 9.]
  [1. 3. 5. 7. 9.]
  [1. 3. 5. 7. 9.]]]
(3, 4, 5)

3.1 ndim & size

ndim은 dimension의 수,
size는 data의 수를 반환한다.

test_array = np.array([[[1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9]],
			[[1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9]],
			[[1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9], [1, 3, 5, 7, 9]]], float)
print(test_array.ndim)
print(test_array.size)

3	# number of dimension
60	# number of data

3.2 reshape

numpy array의 shape을 변경한다. (element의 수는 동일함)

before = np.array([[[1, 2, 3, 4], [5, 6, 7, 8]]], np.float32)
print("before: ")
print(before)

after = before.reshape(2, 2, 2)
print("after: ")
print(after)

before: 
[[[1. 2. 3. 4.]
  [5. 6. 7. 8.]]]
 
after: 
[[[1. 2.]
  [3. 4.]]

 [[5. 6.]
  [7. 8.]]]

reshape을 할 때 한 axis의 수를 정확히 모를때 -1로 하면
size에 맞춰 알아서 변경 됨.

before = np.array([[[1, 2, 3, 4], [5, 6, 7, 8]]], np.float32)
print("before: ")
print(before)

after = before.reshape(-1, 2)
print("after: ")
print(after)

before: 
[[[1. 2. 3. 4.]
  [5. 6. 7. 8.]]]
after: 
[[1. 2.]
 [3. 4.]
 [5. 6.]
 [7. 8.]]

3.3 flatten

다차원의 배열을 1차원 배열로 변환한다. (reshape으로도 할 수 있음)

before = np.array([[[1, 2, 3, 4], [5, 6, 7, 8]]], np.float32)
print("before: ")
print(before)

after = before.flatten()
print("after: ")
print(after)

before: 
[[[1. 2. 3. 4.]
  [5. 6. 7. 8.]]]
after: 
[1. 2. 3. 4. 5. 6. 7. 8.]

4. indexing & slicing

Indexing

array[1][2]를 array[1,2]식으로도 indexing 가능.

Slicing

row와 col부분을 나눠서 slicing이 가능.
(부분 집합 추출시 유용)

np_array = np.array([[1, 2, 3, 4], [5, 6, 7, 8]], np.float32)
print(np_array)		# 전체 array
print(np_array[:,2:4])	# row: 전체, col: 2~3열
print(np_array[1,1:3])	# row: 1열, col: 1~2열

[[1. 2. 3. 4.]
 [5. 6. 7. 8.]]
 
[[3. 4.]
 [7. 8.]]
 
[6. 7.]

5. create array

5.1 arange

array의 범위를 지정하여, list를 생성.

# int type 0 ~ 9
>>> np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# arange(start, end, step)
>>> np.arange(0, 5, 0.5)
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

# reshape 같이 사용.
>>> np.arange(10).reshape(-1, 5)
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

5.2 ones & zeros

특정 값으로 전부 초기화한 array 생성.

>>> np.zeros(10, np.int8)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
>>> np.ones(10)
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

>>> np.zeros((3, 4))
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
>>> np.ones((3, 4), dtype = np.int8)
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int8)

5.3 identity

단위 행렬(i 행렬)을 생성.

>>> np.identity(3)
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])
       
>>> np.identity(n=3, dtype=np.int8)
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]], dtype=int8)

5.4 eye

대각선이 1인 행렬을 생성.


>>> np.eye(3, 5)
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.]])

# k값으로 시작 index 변경 가능.
>>> np.eye(3, 5, k = 2)
array([[0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

5.5 diag

대각 행렬의 값을 추출.

>>> np_array = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> np_array
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
       

>>> np.diag(np_array)
array([1, 6])

# k값으로 시작 index 변경 가능.
>>> np.diag(np_array, k = 2)
array([3, 8])

6. Operation by functions

6.1 sum

>>> np_array = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> np_array.sum()
36

6.2 axis

기준이 되는 dimension축.

np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8])
print(np_array)
print(np_array.shape)

np_array = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(np_array)
print(np_array.shape)

np_array = np.array([[[1, 2, 3, 4], [5, 6, 7, 8]],
			[[1, 2, 3, 4], [5, 6, 7, 8]],
			[[1, 2, 3, 4], [5, 6, 7, 8]]])
print(np_array)
print(np_array.shape)

[1 2 3 4 5 6 7 8]
# axis = 0
(8,)

[[1 2 3 4]
 [5 6 7 8]]
# (axis = 0, axis = 1)
(2, 4)

[[[1 2 3 4]
  [5 6 7 8]]

 [[1 2 3 4]
  [5 6 7 8]]

 [[1 2 3 4]
  [5 6 7 8]]]
# (axis = 0, axis = 1, axis = 2)
(3, 2, 4)

이렇게 가장 늦게 생긴 shape이 axis = 0이다.

axis를 기준으로 연산을 하면,

>>> np_array = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> np_array
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
       
# row를 기준으로 연산
>>> np_array.sum(axis = 0)
array([ 6,  8, 10, 12])

# col를 기준으로 연산
>>> np_array.sum(axis = 1)
array([10, 26])

이외에도 지수, 로그, 삼각, 하이퍼볼릭 함수들을 사용할 수 있다.

7. concatenate

numpy array를 합치는 함수.

7.1 vstack & hstack

두 vector를 축 기준으로 합치는 함수.

>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([4, 5, 6, 7])

>>> a
array([1, 2, 3, 4])
>>> b
array([4, 5, 6, 7])

# row
>>> np.vstack((a,b))
array([[1, 2, 3, 4],
       [4, 5, 6, 7]])
       
# col
>>> np.hstack((a,b))
array([1, 2, 3, 4, 4, 5, 6, 7])

7.2 concatenate

axis값을 기준으로 합치는 함수.

>>> a = np.array([[1, 2, 3, 4]])
>>> b = np.array([[4, 5, 6, 7]])

>>> a
array([[1, 2, 3, 4]])
>>> b
array([[4, 5, 6, 7]])

>>> np.concatenate((a,b), axis=0)
array([[1, 2, 3, 4],
       [4, 5, 6, 7]])

8. Operation between arrays

array끼리 기본적인 사칙연산을 지원.

8.1 shape이 같은 경우.

>>> np_array = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

# 같은 자리에 있는 원소끼리 연산.
>>> np_array + np_array
array([[ 2,  4,  6,  8],
       [10, 12, 14, 16]])

8.2 shape이 다른 경우 - broadcasting

shape이 다른 경우 연산을 지원.

>>> a = np.array([[1, 2, 3], [4, 5, 6]], float)
>>> a
array([[1., 2., 3.],
       [4., 5., 6.]])

# matrix + scalar
>>> a + 1
array([[2., 3., 4.],
       [5., 6., 7.]])

>>> b = np.array([3, 2, 1], float)
>>> b
array([3., 2., 1.])

# matrix + vector
>>> a + b
array([[4., 4., 4.],
       [7., 7., 7.]])

8.3 dot product

행렬곱

>>> a = np.array([[1, 2, 3], [4, 5, 6]], float)
>>> a
array([[1., 2., 3.],
       [4., 5., 6.]])
       
>>> b = np.array([[1, 2], [1, 1], [1, 1]], float)
>>> b
array([[1., 2.],
       [1., 1.],
       [1., 1.]])
       
>>> a.dot(b)
array([[ 6.,  7.],
       [15., 19.]])

8.4 transpose

전치행렬

>>> a = np.array([[1, 2, 3], [4, 5, 6]], float)
>>> a
array([[1., 2., 3.],
       [4., 5., 6.]])
       
>>> a.transpose()
array([[1., 4.],
       [2., 5.],
       [3., 6.]])
       
>>> a.T
array([[1., 4.],
       [2., 5.],
       [3., 6.]])

8.5 comparison

8.5.1 All & Any

array의 데이터가 전부(and) 조건에 만족하는지 여부 반환.
array의 데이터가 일부(or) 조건에 만족하는지 여부 반환.

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> np.all(a < 10)
True
>>> np.all(a > 3)
False

>>> np.any(a > 3)
True
>>> np.any(a > 10)
False

# 이렇게 사용할 수 있다.
>>> a > 5
array([False, False, False, False, False, False,  True,  True,  True, True])

9. 추가로 많이 쓰는 기능들.

9.1 np.where

조건을 만족하는 index를 반환하므로 유용하게 사용.

where(조건, true, false)

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.where(a > 5, 1, 0)
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1])

where(조건)

# a > 5 를 만족하는 index를 반환.
>>> np.where(a > 5)
(array([6, 7, 8, 9]),)

9.2 argmax & argmin

max, min 값의 index를 반환한다.

>>> a = np.array([2, 4, 6, 8, 10, 1, 3, 5, 7, 9], np.int8)
>>> np.argmax(a)
4
>>> np.argmin(a)
5

# axis를 기준으로 비교할 수 있음.
>>> a = np.array([[1, 9, 6, 4], [3, 6, 2, 8]])
>>> a
array([[1, 9, 6, 4],
       [3, 6, 2, 8]])
>>> np.argmax(a, axis=0)
array([1, 0, 0, 1])
>>> np.argmax(a, axis=1)
array([1, 3])

9.3 boolean index

조건이 True인 index의 element만 추출할 때 사용.

>>> a = np.arange(8)
>>> a 
array([0, 1, 2, 3, 4, 5, 6, 7])
>>> a > 2
array([False, False, False,  True,  True,  True,  True,  True])

# index에 True만 넣어줌.
>>> a[a > 2]
array([3, 4, 5, 6, 7])

>>> index = a > 2
>>> a[index]
array([3, 4, 5, 6, 7])

9.4 fancy index

array 자체를 index로 사용해 element를 추출할 수 있다.

>>> a = np.array([1, 5, 10, 42, 2, 3, 79, 28])
>>> b = np.array([3, 3, 3, 0, 0, 7, 7, 1])

# b array를 a의 index로 사용함.
>>> a[b]
array([42, 42, 42,  1,  1, 28, 28,  5])

참고.
https://ko.wikipedia.org/wiki/NumPy
https://laboputer.github.io/machine-learning/2020/04/25/numpy-quickstart/#item1

현리

프론트엔드 개발자 입니다. 최근에는 Flutter를 이용한 크로스 플랫폼 앱 개발에 관심이 많습니다.

Numpy : Numerical Python (파이썬 수학 & 과학 연산 패키지)

Python