t-test_1

๊น€์ง€์œคยท2023๋…„ 4์›” 20์ผ
0

Scipy

๋ชฉ๋ก ๋ณด๊ธฐ
1/4
post-thumbnail

๐Ÿ›ป ๋ถ€์‚ฐ๊ด‘์—ญ์‹œ_์—ฐ๋„๋ณ„ ๋ฐ ์„ฑ๋ณ„ _1์ธ๊ฐ€๊ตฌ ์ธ๊ตฌ์ถ”์ด

๋จผ์ €, ์œ„ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๊ธฐ๋ณธ์ ์ธ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•ด๋ณด์ž.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.set_printoptions(precision=5, suppress=True) #์†Œ์ˆ˜์  5์ž๋ฆฌ๊นŒ์ง€๋งŒ ํ‘œํ˜„๋˜๋„๋ก
filename = '์„ฑ_๋ฐ_์—ฐ๋ น๋ณ„_1์ธ๊ฐ€๊ตฌ__์‹œ๊ตฐ๊ตฌ_20230315141048.csv'
np_data = pd.read_csv(filename,encoding='cp949').to_numpy()
print(np_data)

#  [['๋ถ€์‚ฐ๊ด‘์—ญ์‹œ' 'ํ•ฉ๊ณ„' 2015 164617 197132]
#   ['๋ถ€์‚ฐ๊ด‘์—ญ์‹œ' 'ํ•ฉ๊ณ„' 2016 168035 204377]
#   ['๋ถ€์‚ฐ๊ด‘์—ญ์‹œ' 'ํ•ฉ๊ณ„' 2017 176932 211967]
#   ['๋ถ€์‚ฐ๊ด‘์—ญ์‹œ' 'ํ•ฉ๊ณ„' 2018 183579 220829]
#   ['๋ถ€์‚ฐ๊ด‘์—ญ์‹œ' 'ํ•ฉ๊ณ„' 2019 191796 231431]
#   ['๋ถ€์‚ฐ๊ด‘์—ญ์‹œ' 'ํ•ฉ๊ณ„' 2020 206311 248896]
#   ['๋ถ€์‚ฐ๊ด‘์—ญ์‹œ' 'ํ•ฉ๊ณ„' 2021 222040 265322]]
  • np_data ์—์„œ '๋ถ€์‚ฐ๊ด‘์—ญ์‹œ','ํ•ฉ๊ณ„' ์ œ์™ธ dtype ์„ np.int64๋กœ ์ง€์ •
sub_data = np_data[:,2:].astype(np.int64)
print(sub_data)

#  [[  2015 164617 197132]
#   [  2016 168035 204377]
#   [  2017 176932 211967]
#   [  2018 183579 220829]
#   [  2019 191796 231431]
#   [  2020 206311 248896]
#   [  2021 222040 265322]]
  • 2015-2021๋…„๊นŒ์ง€ ๋‚จ์ž/์—ฌ์ž 1์ธ ๊ฐ€๊ตฌ ํ‰๊ท ์น˜ & ํŽธ์ฐจ
man_mean = np.mean(sub_data[:,1])
woman_mean = np.mean(sub_data[:,2])
print(man_mean)      # 187615.7142857143
print(woman_mean)    # 225707.7142857143

man_std = np_data[:,1] - man_mean
woman_std = np_data[:,2] - man_mean

print( man_std)
# [-22998.71429 -19580.71429 -10683.71429  -4036.71429   4180.28571
#   18695.28571  34424.28571]
print(woman_std)
#  [ 9516.28571 16761.28571 24351.28571 33213.28571 43815.28571 61280.28571
#   77706.28571]
  • 2015-2021๋…„๊นŒ์ง€ ๋‚จ/๋…€ 1์ธ๊ฐ€๊ตฌ ์ƒ๊ด€๋„
corr = np.corrcoef(sub_data[:,1],sub_data[:,2])
print(corr)

#  [[1.     0.9987]
#   [0.9987 1.    ]]

์ƒ๊ด€๊ณ„์ˆ˜๊ฐ€ 0.9987์ด๋ฏ€๋กœ ์—ฌ์ž 1์ธ๊ฐ€๊ตฌ์ˆ˜ ์ฆ๊ฐ€ --> ๋‚จ์ž 1์ธ ๊ฐ€๊ตฌ์ˆ˜๋„ ์ฆ๊ฐ€ํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.
.

  • 2015-2021๋…„ ์—ฐ๋„๋ณ„ ๋‚จ/๋…€ 1์ธ๊ฐ€๊ตฌ ๋น„์œจ
year_sum = np.sum(sub_data[:,1:],axis=1)
print(year_sum)
# [361749 372412 388899 404408 423227 455207 487362]

man_per = sub_data[:,1] / year_sum
woman_per = sub_data[:,2] / year_sum 
print(man_per)     # [0.45506 0.45121 0.45496 0.45395 0.45318 0.45322 0.4556 ]
print(woman_per)   # [0.54494 0.54879 0.54504 0.54605 0.54682 0.54678 0.5444 ]

.
.
.
.

๐Ÿ›ป ttest

: ๋‘ ๊ฐœ ๊ทธ๋ฃน์˜ ํ‰๊ท ๊ฐ’ ๋น„๊ต
: stats.ttest_ind(a,b)

  • ๊ท€๋ฌด๊ฐ€์„ค : ๋‘ ๊ฐœ ๊ทธ๋ฃน์˜ ํ‰๊ท ๊ฐ’์€ ์ฐจ์ด๊ฐ€ ์—†๋‹ค.
  • ๋Œ€๋ฆฝ๊ฐ€์„ค : ๋‘ ๊ฐœ ๊ทธ๋ฃน์˜ ํ‰๊ท ๊ฐ’์€ ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค.
  • ์œ ์˜์ˆ˜์ค€ 5% ์ผ ๋•Œ, p-value < 0.05 โ†’ ๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ
from scipy import stats
print(sub_data)

#  [[  2015 164617 197132]
#   [  2016 168035 204377]
#   [  2017 176932 211967]
#   [  2018 183579 220829]
#   [  2019 191796 231431]
#   [  2020 206311 248896]
#   [  2021 222040 265322]]
man = sub_data[:,1]
woman = sub_data[:,2]

stats.ttest_ind(man, woman)


p-value = 0.0086 < 0.05 ์ด๋ฏ€๋กœ, "๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ"
ยป ์ฆ‰, ๋‘ ์ง‘๋‹จ์˜ ํ‰๊ท ์˜ ์ฐจ์ด๋Š” ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜ํ•˜๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. < 0.05
.
.
.

๋‹จ, t-test ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์ „, 2๊ฐ€์ง€ ์กฐ๊ฑด์„ ํ™•์ธํ•ด์•ผํ•œ๋‹ค.

โŒจ๏ธ t-test์˜ ์กฐ๊ฑด

  1. ๊ฐ ์ƒ˜ํ”Œ์˜ ๋ชจ์ง‘๋‹จ์€ normal distribution์„ ๋”ฐ๋ฅธ๋‹ค.
    (ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ ๊ฒ€์ • : shapiro, anderson, kstest, q-qplot )

  2. ๊ฐ ์ƒ˜ํ”Œ์˜ ๋ชจ์ง‘๋‹จ์€ ๋ถ„์‚ฐ์€ ๊ฐ™๋‹ค.
    (๋“ฑ๋ถ„์‚ฐ ๊ฒ€์ • : barlet, levene )

โŒจ๏ธ ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ ๊ฒ€์ •

1. shapiro ๊ฒ€์ •

  • ๊ท€๋ฌด๊ฐ€์„ค : ์ƒ˜ํ”Œ์˜ ๋ชจ์ง‘๋‹จ์€ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.
stats.shapiro(man)
stats.shapiro(woman)

man : p-value = 0.74499 > 0.05 ์ด๋ฏ€๋กœ, "๊ท€๋ฌด๊ฐ€์„ค ์ฑ„ํƒ"
woman : p-value = 0.66956 > 0.05 ์ด๋ฏ€๋กœ, "๊ท€๋ฌด๊ฐ€์„ค ์ฑ„ํƒ"

ยป ์ฆ‰, man๊ณผ woman ์ƒ˜ํ”Œ์˜ ๋ชจ์ง‘๋‹จ์€ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
.
.
2. anderson ๊ฒ€์ •

  • ๊ท€๋ฌด๊ฐ€์„ค : ์ƒ˜ํ”Œ์˜ ๋ชจ์ง‘๋‹จ์€ ์„ ํƒ๋œ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.
  • anderson์€ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ถ„ํฌ ์„ ํƒ ๊ฐ€๋Šฅ
  • ๊ธฐ๋ณธ๊ฐ’์€ normal distribution
stats.anderson(man)

.
.
3. kstest ๊ฒ€์ •
: goodness of fit (์„ ํƒ๋œ ๋ถ„ํฌ์™€ ์ผ์น˜ํ•˜๋Š”์ง€ ๊ฒ€์ •)

  • ๊ท€๋ฌด๊ฐ€์„ค : ์„ ํƒ๋œ ๋ถ„ํฌ์™€ ๋ฐ์ดํ„ฐ๊ฐ€ ์ผ์น˜ํ•จ
stats.kstest(man, stats.norm.cdf)

p-value < 0.05 ์ด๋ฏ€๋กœ, "๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ"
.
.
4. Q-Q plot
stats.probplot
qqplot ์€ ๋ถ„์œ„์ˆ˜๋Œ€์กฐ๋„๋กœ ๋ถˆ๋ฆฌ๋ฉฐ, ์ •๊ทœ๋ชจ์ง‘๋‹จ ๊ฐ€์ •์„ ํ•˜๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋ฉฐ ์ˆ˜์ง‘ ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ์˜ ๋ถ„์œ„์ˆ˜์™€ ๋น„๊ตํ•˜์—ฌ ๊ทธ๋ฆฌ๋Š” ๊ทธ๋ž˜ํ”„์ด๋‹ค.

๋ชจ์ง‘๋‹จ์ด ์ •๊ทœ์„ฑ์„ ๋”ฐ๋ฅธ๋‹ค๋ฉด , ์ง์„ ์˜ ํ˜•ํƒœ ๋กœ ๊ทธ๋ ค์ง€๊ฒŒ ๋œ๋‹ค.

_, axe  = plt.subplots()
stats.probplot(man,plot=axe)

โŒจ๏ธ ๋“ฑ๋ถ„์‚ฐ ๊ฒ€์ •
1. bartlett ๊ฒ€์ •

  • ๊ท€๋ฌด๊ฐ€์„ค : ๋‘ ๊ฐœ ๊ทธ๋ฃน์˜ ๋ถ„์‚ฐ์€ ๊ฐ™๋‹ค. (๋“ฑ๋ถ„์‚ฐ)
stats.bartlett(man, woman)

ยป p-value = 0.6949 > 0.05 ์ด๋ฏ€๋กœ, "๊ท€๋ฌด๊ฐ€์„ค ์ฑ„ํƒ"
.
.
2. levene ๊ฒ€์ •

  • ๊ท€๋ฌด๊ฐ€์„ค : ๋‘๊ฐœ ๊ทธ๋ฃน์˜ ๋ถ„์‚ฐ์€ ๊ฐ™๋‹ค. (๋“ฑ๋ถ„์‚ฐ)
stats.levene(man, woman)

ยป p-value = 0.6811 > 0.05 ์ด๋ฏ€๋กœ, "๊ท€๋ฌด๊ฐ€์„ค ์ฑ„ํƒ"

profile
๋ฐ์ดํ„ฐ ๋ถ„์„ / ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ / AI ๋”ฅ๋Ÿฌ๋‹

0๊ฐœ์˜ ๋Œ“๊ธ€

๊ด€๋ จ ์ฑ„์šฉ ์ •๋ณด