[statistics] 𝜒2 -test

박경국·2021년 12월 2일

Statistics

목록 보기

5/16

1) 표본이 서로 독립적이여야 한다.
2) 표본이 정규분포를 이뤄야 한다.

from scipy.stats import normaltest
import numpy as np

sample = np.random.normaal(size =  1000) # normal 분포가 아님
normaltest(sample)

3) 비교하는 두 표본의 분산이 통계적으로 유사해야한다. (p > 0.05)

1) Categorical 데이터에 적합한 모델링이 가능함
2) 극단적 outlier가 있는 경우에도 사용할 수 있음
3) distribution free method라고도 부름

예상되는 분포는 대개 데이터의 평균을 사용함

 𝜒2 = ∑(𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑i−𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑖)^2 / (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑖)
 
 ns_obs = np.array([[5, 23, 26, 19, 24, 23]])

import numpy as np
from scipy.stats import chisquare  

chisquare(a, axis=None)

ex1) 마스크 착용 여부와 코로나19 감염 여부
ex2) 혼인 여부에 따른 여가 비용

from scipy.stats import chi2_contingency

a= pd.crosstab(customer['marriage'], customer['consum_alchol'])
print(chi2_contingency(a)