Numpy

Nam Eun-Ji·2021년 1월 13일

Numpy?

numerical python
과학계산용 고성능 컴퓨팅과 데이터 분석에 필요한 파이썬 패키지
공식 홈페이지

pip install numpy

특징

빠르고 메모리를 효율적으로 사용하여 벡터의 산술연산과 브로드캐스팅 연산을 지원하는 다차원 배열 ndarray 데이터 타입을 지원한다.
반복문을 작성할 필요 없이 전체 데이터 배열에 대해 빠른 연산을 제공하는 다양한 표준 수학 함수를 제공한다.
배열 데이터를 디스크에 쓰거나 읽을 수 있다.
선형대수, 난수발생기, 푸리에 변환 가능, C/C++ 포트란으로 쓰여진 코드를 통합한다.

사용법

ndarray 생성

arange()
array([])

import numpy as np

a = np.arange(5)
b = np.array([0,1,2,3,4])  # 파이썬 리스트를 numpy ndarray로 변환

c = np.array([0,1,2,3,'4'])
D = np.ndarray((5,), np.int64, np.array([0,1,2,3,4]))

print(a)  # [0 1 2 3 4]
print(b)  # [0 1 2 3 4]
print(c)  # ['0' '1' '2' '3' '4']
print(d)  # [0 1 2 3 4]

c를 보면 '4'만 문자열인데 전부 문자열로 들어갔다. 그 이유는 numpy의 array는 모든 element의 type이 동일해야하기 때문이다. 더 나아가서 숫자는 모두 문자열로 바꿀 수 있지만, 문자열은 모두 숫자로 바꿀 수 없어 이와 같은 성질이 적용된 것이다.

특수행렬 생성

numpy는 수학적으로 의미있는 행렬들을 함수로 제공한다.

단위행렬

a = np.eye(3)
print(a)
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

0행렬

a = np.zeros([2,3])
print(a)
# [[0. 0. 0.]
#  [0. 0. 0.]]

1행렬

a = np.ones([3,3])
print(a)
# [[1. 1. 1.]
#  [1. 1. 1.]
#  [1. 1. 1.]]

크기 size

행렬 내 원소 개수

>>> a = np.arange(10)
>>> a.size
10
>>> a = a.reshape(2,5)
>>> a.size
10

모양 shape

행렬의 모양

>>> a = np.arange(10)
>>> a.shape
(10,)

>>> a = a.reshape(2,5)
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> a.shape
(2,5)

축 개수 ndim

행렬의 축(axis)의 개수

>>> a = np.arange(10)
>>> a.ndim
1
>>> b = np.arange(10).reshape(2,5)
>>> b.ndim
2

reshape

모양과 원소의 개수가 맞지 않으면 에러가 난다.
아래 코드에도 보이는 것처럼 c의 원소개수는 10인데 3*3으로는 나누어떨어지지 않아 에러가 난다.
참고 : https://yganalyst.github.io/data_handling/memo_5/

>>> a = np.arange(10)
>>> a
[0 1 2 3 4 5 6 7 8 9]

>>> b = np.arange(10).reshape(2,5)
>>> b
[[0 1 2 3 4]
 [5 6 7 8 9]]

>>> c = np.arange(10).reshape(3,3)  # 에러난다.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot reshape array of size 10 into shape (3,3)

타입 체크 dtype

Numpy는 dtype을 object로 지정해서라도 행렬 내 dtype을 일치시킬 수 있게 한다. (코드 c)

a = np.arange(6).reshape(2,3)
print(A.dtype)  # int64
print(type(A))  # <class 'numpy.ndarray'>

b = np.array([0,1,2,3,'4',5])
print(b.dtype)  # <U21

c = np.array([0,1,2,3,[4,5],6])
print(c)        # [0 1 2 3 list([4, 5]) 6]
print(c.dtype)  # object

원소 type 확인

a = np.array([0,1,2,3,'4',5])
print(type(a[0]), type(a[4])) #<class 'numpy.str_'> <class 'numpy.str_'>

b = np.array([0,1,2,3,[4,5],6])
print(type(b[0]), type(b[4]))  # print(type(b[0]), type(b[4]))

브로드캐스팅 Broadcasting

공식문서 참고

A = np.arange(9).reshape(3,3)
# [[0 1 2]
#  [3 4 5]
#  [6 7 8]]

# A에 2를 상수배했을 때,
print(A * 2)
# [[ 0  2  4]
#  [ 6  8 10]
#  [12 14 16]]

# A에 2를 더했을 때,
print(A + 2)
# [[ 2  3  4]
#  [ 5  6  7]
#  [ 8  9 10]]

# 3 X 3 행렬에 1 X 3 행렬을 더했을 때
B = np.array([1, 2, 3])
print(A+B)
# [[ 1  3  5]
#  [ 4  6  8]
#  [ 7  9 11]]

# 3 X 3 행렬에 3 X 1 행렬을 더했을 때
C = np.array([[1], [2], [3]])
print(A+C)
# [[ 1  2  3]
#  [ 5  6  7]
#  [ 9 10 11]]

# 3 X 3 행렬에 1 X 2 행렬을 더하는 것은 허용되지 않는다.
A = np.arange(9).reshape(3,3)
D = np.array([1, 2])
print(A+D)
# ValueError: operands could not be broadcast together with shapes (3,3) (2,)

numpy vs list

print([1,2]+[3,4])  # [1, 2, 3, 4]
print([1,2]+3) # TypeError: can only concatenate list (not "int") to list

print(np.array([1,2])+np.array([3,4])) # [4 6]
print(np.array([1,2])+3)  # [4 5]

index

[행, index]

A = np.arange(9).reshape(3,3)

B = A[0]
print(B)       # [0 1 2]
print(A[0,1])  # 1
print(B[1])    # 1

slice

참고 : https://076923.github.io/posts/Python-numpy-5/

행 자르기
단순히 행만 자르는 것이라면 [시작 : 끝 : 간격]

A = np.arange(9).reshape(3,3)
print(A[:-1])
# [[0 1 2]
#  [3 4 5]]

열 자르기
열을 자르려면 2차원에 접근해야하므로 [행, 열]로 접근해야한다.
[행 시작 : 행 끝, 열 시작 : 열 끝]

A = np.arange(9).reshape(3,3)
print(A[:,:-1])
# [[0 1]
#  [3 4]
#  [6 7]]

:가 들어가지 않으면 인덱스를 뜻한다.

print(A[1,:2])  # [3 4]

예제

code	result
A[:, 2:]	[[2] [5] [8]]
A[:, 1:]	[[1 2] [4 5] [7 8]]
A[:, :]	[[0 1 2] [3 4 5] [6 7 8]]
A[:, -1:]	[[2] [5] [8]]
A[:, -2:]	[[1 2] [4 5] [7 8]]
A[:, -3:]	[[0 1 2] [3 4 5] [6 7 8]]

난수 random

np.random.random()
0에서 1사이의 실수형 난수 하나를 생성

print(np.random.random())  # 0.5384151669289275

np.random.randint()
.randint(시작, 끝) 형식으로, 시작부터 끝 이전 숫자까지에서 랜덤으로 정수형 난수 하나를 생성

print(np.random.randint(0,10))  # 0과 9사이 랜덤
# 5

np.random.choice()
리스트에 주어진 값 중 하나를 랜덤하게 골라준다.

print(np.random.choice([0,1,2,3,4,5,6,7,8,9]))  # 6

np.random.permutation()
원소의 순서를 임의로 섞어준다.

print(np.random.permutation(10))   # [8 7 9 6 0 4 2 3 1 5]
print(np.random.permutation([0,1,2,3,4,5,6,7,8,9])) # [5 7 6 9 4 3 2 8 0 1]

np.random.normal()
어떤 분포를 따르는 변수를 임의로 표본추출해준다.

# 정규분포
# 평균(loc), 표준편차(scale), 추출개수(size)
print(np.random.normal(loc=0, scale=1, size=5))
# [-1.72691092 -1.30053836  1.35549366 -0.15073381  0.77008151]

# 균등분포
# 최소(low), 최대(high), 추출개수(size)
print(np.random.uniform(low=-1, high=1, size=5))
# [ 0.94862887 -0.02163201 -0.15102288  0.88740309  0.69740139]

정규분포, 균등분포란 무엇인가

np.random.uniform()

전치행렬

전치행렬이란 행과 열을 교환하여 얻는 행렬이다. 즉, 주대각선을 축으로 하는 반사 대칭을 가하여 얻는 행렬이다.

arr.T

A = np.arange(24).reshape(2,3,4)
print(A)    # A는 (2,3,4)의 shape를 가진 행렬
print('----------------------------')
print(A.T)  # 이것은 A의 전치행렬입니다. 
print('----------------------------')
print(A.T.shape) # A의 전치행렬은 (4,3,2)의 shape를 가진 행렬

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
----------------------------
[[[ 0 12]
  [ 4 16]
  [ 8 20]]

 [[ 1 13]
  [ 5 17]
  [ 9 21]]

 [[ 2 14]
  [ 6 18]
  [10 22]]

 [[ 3 15]
  [ 7 19]
  [11 23]]]
----------------------------
(4, 3, 2)

np.transpose

행렬의 축을 어떻게 변환해 줄지 임의로 지정해 줄 수 있는 일반적인 행렬 전치 함수

# np.transpose(A, (2,1,0)) 은 A.T와 정확히 같다.

A = np.arange(24).reshape(2,3,4)
B = np.transpose(A, (2,0,1))
print(A)        # A는 (2,3,4)의 shape를 가진 행렬 
print('----------------------------')
print(B)        # B는 A의 3, 1, 2번째 축을 자신의 1, 2, 3번째 축으로 가진 행렬
print('----------------------------')
print(B.shape)  # B는 (4,2,3)의 shape를 가진 행렬

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
----------------------------
[[[ 0  4  8]
  [12 16 20]]

 [[ 1  5  9]
  [13 17 21]]

 [[ 2  6 10]
  [14 18 22]]

 [[ 3  7 11]
  [15 19 23]]]
----------------------------
(4, 2, 3)

통계 계산

합계 sum
평균 mean
표준편차 std
중앙값 median

nums = np.arange(10)
print(f'합계 : {nums.sum()}')
print(f'평균 : {nums.mean()}')
print(f'표준편차 : {nums.std()}')
print(f'중앙값 : {np.median(nums)}')

합계 : 45
평균 : 4.5
표준편차 : 2.8722813232690143
중앙값 : 4.5

제곱 square

print(np.square(nums))
# [ 0  1  4  9 16 25 36 49 64 81]

Numpy 예시

평균 제곱 오차

측정값(예측값)과 참이라고 생각되는 값과의 차이, 즉 이 오차에 대한 제곱의 평균.
정확도에 대한 척도 중 하나이다.

$MeanSquereError = \frac{1}{n}\sum_{i=1}^{n}(Yprediction_{i}-Y_{i})^2$

error = (1/n) * np.sum(np.squere(predictions-labes))

n = 3
predictions = np.array([1,2,3])
labels = np.array([1,1,1])
error = (1/n) * np.sum(np.square(predictions-labels))
print(error)  # 1.6666666666666665