[statistics] ๐œ’2 -test

๋ฐ•๊ฒฝ๊ตญยท2021๋…„ 12์›” 2์ผ
0

Statistics

๋ชฉ๋ก ๋ณด๊ธฐ
5/16
post-thumbnail

๐œ’2 -test๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ 

1. t-test๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ช‡๊ฐ€์ง€ ์กฐ๊ฑด์ด ์žˆ์Œ

1) ํ‘œ๋ณธ์ด ์„œ๋กœ ๋…๋ฆฝ์ ์ด์—ฌ์•ผ ํ•œ๋‹ค.
2) ํ‘œ๋ณธ์ด ์ •๊ทœ๋ถ„ํฌ๋ฅผ ์ด๋ค„์•ผ ํ•œ๋‹ค.

  • scipy.stats์˜ normaltest๋ฅผ ํ†ตํ•ด ํ™•์ธ
from scipy.stats import normaltest
import numpy as np

sample = np.random.normaal(size =  1000) # normal ๋ถ„ํฌ๊ฐ€ ์•„๋‹˜
normaltest(sample) 

3) ๋น„๊ตํ•˜๋Š” ๋‘ ํ‘œ๋ณธ์˜ ๋ถ„์‚ฐ์ด ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์‚ฌํ•ด์•ผํ•œ๋‹ค. (p > 0.05)

2. ๐œ’2 -test๋Š” ๋ชจ์ง‘๋‹จ์ด ํŠน์ • ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š์•„๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ.

1) Categorical ๋ฐ์ดํ„ฐ์— ์ ํ•ฉํ•œ ๋ชจ๋ธ๋ง์ด ๊ฐ€๋Šฅํ•จ
2) ๊ทน๋‹จ์  outlier๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ์—๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ
3) distribution free method๋ผ๊ณ ๋„ ๋ถ€๋ฆ„

๐œ’2 -test์˜ ๊ฐ€์„ค ์„ค๊ณ„

  1. one - sample ๐œ’2 -test์˜ ๊ฒฝ์šฐ
  • ๊ท€๋ฌด๊ฐ€์„ค : ๋ฐ์ดํ„ฐ๊ฐ€ ์˜ˆ์ƒ๋˜๋Š” ๋ถ„ํฌ์™€ ์œ ์‚ฌํ•œ ๋ถ„ํฌ๋ฅผ ๊ทธ๋ฆด ๊ฒƒ์ด๋‹ค.

  • ๋Œ€๋ฆฝ๊ฐ€์„ค : ๋ฐ์ดํ„ฐ๊ฐ€ ์˜ˆ์ƒ๋˜๋Š” ๋ถ„ํฌ์™€ ์œ ์‚ฌํ•˜์ง€ ์•Š์„ ๊ฒƒ์ด๋‹ค.

  • ์˜ˆ์ƒ๋˜๋Š” ๋ถ„ํฌ๋Š” ๋Œ€๊ฐœ ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ์„ ์‚ฌ์šฉํ•จ

     ๐œ’2 = โˆ‘(๐‘œ๐‘๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘’๐‘‘iโˆ’๐‘’๐‘ฅ๐‘๐‘’๐‘๐‘ก๐‘’๐‘‘๐‘–)^2 / (๐‘’๐‘ฅ๐‘๐‘’๐‘๐‘ก๐‘’๐‘‘๐‘–)
     
     ns_obs = np.array([[5, 23, 26, 19, 24, 23]])
import numpy as np
from scipy.stats import chisquare  

chisquare(a, axis=None)

2. Two - sample ๐œ’2 test์˜ ๊ฒฝ์šฐ

  • ๊ท€๋ฌด๊ฐ€์„ค : ๋‘ ๋ณ€์ˆ˜๊ฐ€ ์„œ๋กœ ๋…๋ฆฝ์ ์ด๋‹ค.
  • ๋Œ€๋ฆฝ๊ฐ€์„ค : ๋‘ ๋ณ€์ˆ˜๊ฐ€ ์„œ๋กœ ๋…๋ฆฝ์ ์ด์ง€ ์•Š๋‹ค.
  • ๋ณ€์ˆ˜ ์ค‘ ํ•˜๋‚˜๋Š” categorical ๋ฐ์ดํ„ฐ์—ฌ์•ผ ํ•œ๋‹ค.

ex1) ๋งˆ์Šคํฌ ์ฐฉ์šฉ ์—ฌ๋ถ€์™€ ์ฝ”๋กœ๋‚˜19 ๊ฐ์—ผ ์—ฌ๋ถ€
ex2) ํ˜ผ์ธ ์—ฌ๋ถ€์— ๋”ฐ๋ฅธ ์—ฌ๊ฐ€ ๋น„์šฉ

from scipy.stats import chi2_contingency

a= pd.crosstab(customer['marriage'], customer['consum_alchol'])
print(chi2_contingency(a)

0๊ฐœ์˜ ๋Œ“๊ธ€

๊ด€๋ จ ์ฑ„์šฉ ์ •๋ณด

Powered by GraphCDN, the GraphQL CDN