[Python 통계 실습] 일원 분산 분석(one-way ANOVA)

robin·2021년 8월 23일

ANOVA python 통계 실습

Python 통계 실습

목록 보기

3/3

일원 분산 분석 절차

1) 집단의 등분산성 검정

Levene's test

[코드]

import pingouin as pg
pg.homoscedasticity(dv = 'score', group = 'school', data = schools_df)

[결과 판별]

p < .05 : 모든 집단의 분산이 같지 않다.
p > .05 : 모든 집단의 분산이 같다.

2) 일원 분산 분석

등분산성 가정을 충족하면 ANOVA

[코드]

pg.anova(dv = 'score', between = 'school', data = schools_df, detailed=True)

등분산성 가정을 충족하지 않으면 Welch's ANOVA

[코드]

pg.welch_anova(dv = 'score', between = 'school', data = schools_df)

[결과 판별]

p < .05 : 모든 집단의 평균이 같지 않다. ➡️ 사후 검정을 진행한다.
p > .05 : 모든 집단의 평균이 같다. ➡️ 사후 검정을 진행하지 않는다.

3) 사후 검정

등분산성 가정을 충족하면 Tukey's test

[코드]

pg.pairwise_tukey(dv = 'score', between = 'school', data = schools_df)

등분산성 가정을 충족하지 않으면 Games Howell's test

[코드]

pg.pairwise_gameshowell(dv = 'score', between = 'school', data = schools_df)

[결과 판별]

p < .05 : 집단 간 평균 차이가 있다.
p > .05 : 집단 간 평균 차이가 없다.

[실습]

0. 데이터 준비

데이터 셋: 캐글 경매 데이터
https://www.kaggle.com/onlineauctions/online-auctions-dataset

데이터 불러오기

[코드]

import pandas as pd
import pingouin as pg

auction = pd.read_csv('auction.csv')

print(auction.shape)
print(auction.head(3))
print(auction.tail(3))

[결과]

auction.shape:
(10681, 9)

auction.head(3):

auction.tail(3):

1. 등분산성(homoscedasticity) 검정

Levene's test
- 귀무가설: 모든 집단의 분산이 같다.
- if p-value < .05: 귀무가설 기각 성공. 모든 집단의 분산이 같지 않다. ➡️ pg.welch_anova 사용
- if p-value > .05: 귀무가설 기각 실패. 모든 집단의 분산이 같다. ➡️ pg.anova 사용

[코드]

pg.homoscedasticity(dv = 'openbid', group = 'item', data = auction)

[결과]

	W	pval	equal_var
levene	471.159381	8.101034e-197	False

Levene's test를 이용하여 등분산성을 검정한 결과 등분산성 가정을 만족하지 않았다, p < .05.

2. 일원 분산 분석 (one-way ANOVA)

분석 목적

: item별 평균 open bid가 통계적으로 유의미하게 차이나는지 알아보고자 함.

(1) 각 그룹의 평균과 표준편차

[코드]

watch = auction[auction['item']=='Cartier wristwatch']['openbid']
PDA = auction[auction['item']=='Palm Pilot M515 PDA']['openbid']
console = auction[auction['item']=='Xbox game console']['openbid']

print('watch 평균 가격:', watch.mean(), 'watch 표준 편차:', watch.std())
print('PDA 평균 가격:', PDA.mean(), 'PDA 표준 편차:', PDA.std())
print('console 평균 가격:', console.mean(), 'console 표준 편차:', console.std())

[결과]
watch 평균 가격: 153.44
watch 표준 편차: 360.69

PDA 평균 가격: 31.56
PDA 표준 편차: 60.37

console 평균 가격: 25.48
console 표준 편차: 32.68

(2) 일원 분산 분석 (one-way ANOVA)

등분산가정을 만족하지 않으므로 Welch's ANOVA를 이용함.

[코드]

pg.welch_anova(dv = 'openbid', between = 'item', data = auction)

[결과]

	Source	ddof1	ddof2	F	p-unc	np2
0	item	2	4259.671584	136.580677	3.221606e-58	0.080984

세 item간 평균 open bid의 차이가 유의미한지 알아보기 위하여 일원 분산 분석을 하였다. 그 결과, p-value가 .05 보다 작기 때문에 집단 간 평균 open bid 차이가 통계적으로 유의미하였다.

3. 사후 검정

집단 간 open bid 평균 차이가 어디서 발생하는지 알아보기 위하여 사후 검정을 실시함. 등분산가정을 만족하지 않기 때문에 Games Howell 검정을 이용하여 사후 검정을 실시함.

[코드]

pg.pairwise_gameshowell(dv = 'openbid', between = 'item', data = auction)

[결과]

	A	B	mean(A)	mean(B)	diff	se	T	df	pval	hedges
0	Cartier wristwatch	Palm Pilot M515 PDA	153.437184	31.560857	121.876327	8.199491	14.863889	1988.208564	0.001	0.387861
1	Cartier wristwatch	Xbox game console	153.437184	25.483404	127.953779	8.185081	15.632562	1974.279546	0.001	0.460433
2	Palm Pilot M515 PDA	Xbox game console	31.560857	25.483404	6.077452	0.997896	6.090267	8587.124045	0.001	0.139500

Games Howell 검정을 이용하여 사후 분석을 실시한 결과, watch와 PDA 사이, watch와 console 사이, 그리고 PDA와 console 사이 평균 open bid 차이가 통계적으로 유의미하였다.

robin

데이터 분석을 공부하는 🌱

이전 포스트

[Python 통계 실습] t-검정(t-test)

1개의 댓글

Rowena

2024년 9월 15일

It was a quiet Tuesday evening, and I was in the mood for something different. I decided to log into my blackjack account, feeling a bit adventurous. With a modest $50 to start, I chose to play at a high-stakes table for the first time. The atmosphere https://millionzcasino-france.com was intense, with the potential for big wins—or big losses. My initial hands were nerve-wracking, as I watched my balance fluctuate dramatically. Despite the tension, I held my ground and made a bold move: a $100 bet on a hand that I felt confident about. The cards were dealt, and my heart raced as I drew a blackjack. The payout was substantial, and I felt a rush of exhilaration. That single hand turned my $50 into $800. The gamble was risky, but it paid off handsomely, proving that sometimes, taking a leap can lead to remarkable rewards.

답글 달기

[Python 통계 실습] 일원 분산 분석(one-way ANOVA)