t-test _ 3 / t-test & ANOVA

κΉ€μ§€μœ€Β·2023λ…„ 4μ›” 25일
0

Scipy

λͺ©λ‘ 보기
3/4
post-thumbnail
  • t-test λŠ” 2개 그룹의 평균 비ꡐ
  • ANOVA λŠ” 3개 이상 그룹의 평균 비ꡐ

β€£ 데이터 : 주야별 ꡐ톡사고 - μ‚¬λ§μžμˆ˜

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm
file_name = '주야별_ꡐ톡사고_20230320214703.csv'
death_2011_2021 = pd.read_csv(file_name, encoding='cp949').to_numpy()
print(death_2011_2021[:5])

# result
# [['μ‹œμ ' '주야별(1)' 'μ‚¬λ§μžμˆ˜ (λͺ…)' 'μ‚¬λ§μžμˆ˜ (λͺ…)' 'μ‚¬λ§μžμˆ˜ (λͺ…)' 'μ‚¬λ§μžμˆ˜ (λͺ…)' 'μ‚¬λ§μžμˆ˜ (λͺ…)'
#   'μ‚¬λ§μžμˆ˜ (λͺ…)' 'μ‚¬λ§μžμˆ˜ (λͺ…)' 'μ‚¬λ§μžμˆ˜ (λͺ…)' 'μ‚¬λ§μžμˆ˜ (λͺ…)' 'μ‚¬λ§μžμˆ˜ (λͺ…)' 'μ‚¬λ§μžμˆ˜ (λͺ…)'
#   'μ‚¬λ§μžμˆ˜ (λͺ…)']
#  ['2011' 'μ£Ό' '186' '156' '143' '182' '185' '202' '225' '227' '217' '233'
#   '228' '250']
#  ['2011' 'μ•Ό' '209' '183' '195' '247' '208' '213' '237' '245' '257' '287'
#   '286' '228']
#  ['2012' 'μ£Ό' '206' '194' '179' '233' '218' '238' '198' '193' '233' '261'
#   '238' '196']
#  ['2012' 'μ•Ό' '212' '199' '224' '250' '226' '238' '218' '216' '253' '272'
#   '270' '227']]

πŸ›» κ²€μ • λ‚΄μš©

1. 2021λ…„ μ£Όμ•Ό μ‚¬λ§μžμˆ˜ 평균에 차이가 μžˆλŠ”κ°€ ?
2. 2011 ~ 2021λ…„ μ‚¬λ§μžμˆ˜μ— 월별 차이가 μžˆλŠ”κ°€ ?


  • t-test / shapiro / leveno κ²€μ • ν•¨μˆ˜ λ§Œλ“€κΈ°
def do_mean_comparison(g1, g2) :
    print('*************************')
    print('g1 shapiro :', stats.shapiro(g1))
    print('g2 shapiro :', stats.shapiro(g2))
    print('levene :', stats.levene(g1,g2))
    print('ttest_ind :', stats.ttest_ind(g1,g2))
    print('*************************')

1. 2021λ…„ μ£Όμ•Ό μ‚¬λ§μžμˆ˜ 평균에 차이가 μžˆλŠ”κ°€ ?

  • 2021년도 데이터 μΆ”μΆœν•˜κΈ°
# 2021년도 데이터 μΆ”μΆœν•˜κΈ° 
# 1.
death_2021 = death_2011_2021[-2:]

# 2.
filter = death_2011_2021[:,0] == '2021'
death_2021 = death_2011_2021[filter]

print(death_2021)

# result
# [['2021' 'μ£Ό' '107' '107' '128' '119' '154' '124' '139' '129' '131' '156'
#   '163' '149']
#  ['2021' 'μ•Ό' '90' '96' '81' '93' '101' '103' '115' '118' '115' '156'
#   '122' '120']]
  • data type λ³€κ²½
death_2021_day = death_2021[0,2:].astype(np.float64)
print('day:', death_2021_day)
# day: [107. 107. 128. 119. 154. 124. 139. 129. 131. 156. 163. 149.]

death_2021_night = death_2021[1,2:].astype(np.float64)
print('night:', death_2021_night)
# night: [ 90.  96.  81.  93. 101. 103. 115. 118. 115. 156. 122. 120.]
  • t-tes / shapiro / leveno 진행
do_mean_comparison(death_2021_day,death_2021_night)

  • 95% 신뒰ꡬ간
def get_95_ci(vals) :
    vals_mean = np.mean(vals)
    vals_std = np.std(vals)
    print('upper : ', vals_mean + 1.96*vals_std)
    print('lower : ', vals_mean - 1.96*vals_std)
print(get_95_ci(death_2021_day))
print(get_95_ci(death_2021_night))

# upper :  168.83852850332045
# lower :  98.82813816334624
# None
# upper :  146.36222494390665
# lower :  71.97110838942669
# None

2. 2011 ~ 2021λ…„ 월별 차이가 μžˆλŠ”κ°€ ?

death_2011_2021 = death_2011_2021[1:,]
death_2011_2021[:5]

# result
# [['2011' 'μ£Ό' '186' '156' '143' '182' '185' '202' '225' '227' '217' '233'
#  '228' '250']
#  ['2011' 'μ•Ό' '209' '183' '195' '247' '208' '213' '237' '245' '257' '287'
#   '286' '228']
#  ['2012' 'μ£Ό' '206' '194' '179' '233' '218' '238' '198' '193' '233' '261'
#   '238' '196']
#  ['2012' 'μ•Ό' '212' '199' '224' '250' '226' '238' '218' '216' '253' '272'
#   '270' '227']
#  ['2013' 'μ£Ό' '190' '136' '198' '164' '215' '195' '221' '192' '211' '236'
#   '223' '209']]
  • data type λ³€κ²½
death_2011_2021 = death_2011_2021[:,2:].astype(np.int64)
print(death_2011_2021[:5])

# result
# [[186 156 143 182 185 202 225 227 217 233 228 250]
#  [209 183 195 247 208 213 237 245 257 287 286 228]
#  [206 194 179 233 218 238 198 193 233 261 238 196]
#  [212 199 224 250 226 238 218 216 253 272 270 227]
#  [190 136 198 164 215 195 221 192 211 236 223 209]]
  • 월별 평균 계산
mon_mean = np.mean(death_2011_2021, axis=0)
print(mon_mean)

# [169.77272727 143.77272727 159.13636364 165.77272727 176.36363636 170.27272727 
#  175.13636364  77.18181818 190.81818182 212.86363636 196.5        185.13636364]

3개 이상 그룹의 평균 차이 λ₯Ό κ²€μ •ν•΄μ•Όν•˜κΈ° λ•Œλ¬Έμ—, ANOVA λ₯Ό μ‚¬μš©ν•΄μ•Ό ν•œλ‹€.

πŸ›» ANOVA

3개 κ·Έλ£Ή μ΄μƒμ˜ μ§‘λ‹¨μ˜ 평균차이 κ²€μ •

  • 귀무가섀 : mean값에 차이가 μ—†λ‹€.
  • p-value < 0.05 : 귀무가섀 기각. mean 값에 차이가 μžˆλ‹€.

.
.
.

2011~2021λ…„λ„μ˜ 월별 ν‰κ· μ—λŠ” 차이가 μžˆμ„κΉŒ ?

lista = [death_2011_2021[:,x] for x in range(12)]
stats.f_oneway(*lista)

Β» p-value < 0.05 : 귀무가섀 기각
Β» 즉, 월별 평균값은 차이가 μžˆλ‹€κ³  λ³Ό 수 μžˆλ‹€.


2011 ~ 2021λ…„λ„μ˜ μ•Όκ°„ μ‚¬λ§μžμˆ˜λŠ” 월별 차이가 μžˆμ„κΉŒ ?

death_night = death_2011_2021[1::2, :]
print(death_night)
# [[209 183 195 247 208 213 237 245 257 287 286 228]
#  [212 199 224 250 226 238 218 216 253 272 270 227]
#  [199 199 211 216 205 241 223 220 219 263 256 250]
#  [200 175 205 188 172 182 211 176 245 247 227 218]
#  [207 167 203 186 187 188 180 215 188 229 221 198]
#  [177 157 156 193 137 159 171 161 188 245 201 212]
#  [170 151 168 146 173 144 187 165 214 222 176 193]
#  [155 153 160 165 138 118 144 188 179 169 135 159]
#  [144 102 128 134 131 134 109 122 137 169 128 161]
#  [132 115 115  96 118 143 124 134 137 147 116  86]
#  [ 90  96  81  93 101 103 115 118 115 156 122 120]]
lista = [death_night[:, x] for x in range(12)]
stats.f_oneway(*lista)

Β» p-value = 0.15835 > 0.05 : 귀무가섀 채택
Β» 2011~2021년도 μ•Όκ°„ μ‚¬λ§μžμˆ˜λŠ” 월별 차이가 μ—†λ‹€.


2011 ~ 2021λ…„λ„μ˜ μ£Όκ°„ μ‚¬λ§μžμˆ˜λŠ” 월별 차이가 μžˆμ„κΉŒ ?

death_day = death_2011_2021[::2, :]
print(death_day)
# 
listb = [death_day[:, x] for x in range(12)]
stats.f_oneway(*listb)

Β» p-value < 0.05 : 귀무가섀 기각
Β» 즉, 2011 ~ 2021λ…„λ„μ˜ μ£Όκ°„ μ‚¬λ§μžμˆ˜λŠ” 월별 차이가 μžˆλ‹€.

profile
데이터 뢄석 / 데이터 μ‚¬μ΄μ–Έν‹°μŠ€νŠΈ / AI λ”₯λŸ¬λ‹

0개의 λŒ“κΈ€

κ΄€λ ¨ μ±„μš© 정보