통계방법론 W6

ese2o·2024년 6월 15일
0

Analysis of Variance and Design of Experiments

Design of Experiments

Experimental Design(실험계획): a plan and a structure to test hypotheses in which the experimenter either controls or manipulates one or more variables

ANOVA

Analysis of Variance (분산분석)
dependent variable responses (measurements, data) are not all the same in a given study

ANOVA의 세 가지 종류
1. Completely randomized design (One-way ANOVA)
2. Randomized block design
3. Factorial experiments (Two-way ANOVA)

The Completely Randomized Design (CRD)

  • 독립변수 하나
  • 그 독립변수가 2 이상의 treatment level(or classification) 가짐
    • 만약 treatment level이 2개라면, 이전처럼 t test 사용.

One-way ANOVA

: A hypothesis testing technique that is used to compare the
means of three or more populations when there is only one independent variable
One-way ANOVA analyzes all the sample means at one time and thus precludes the buildup of error rate.

SSE(Error Sum of Squares): The error variance, or that portion of the total variance unexplained by the treatment
SSC(Treatment Sum of Squares): The variance resulting from the treatment(columns)
SST(Total Sum of Squares): SST = SSC + SSE

j=1Ci=1nj(xijxˉ)2=j=1Cnj(xˉjxˉ)2+j=1Ci=1nj(xijxˉj)2\sum_{j=1}^C \sum_{i=1}^{n_j}\left(x_{i j}-\bar{x}\right)^2=\sum_{j=1}^C n_j\left(\bar{x}_j-\bar{x}\right)^2+\sum_{j=1}^C \sum_{i=1}^{n_j}\left(x_{i j}-\bar{x}_j\right)^2

CC : number of treatment levels
jj : index for each treatment level
njn_j : number of observations in a given treatment level
ii : index for each member of a treatment level
xˉ\bar x : total mean
xˉj\bar x_j : mean of a treatment group or level
xijx_{ij} : individual value

the mean square of columns

MSC=SSCC1\mathrm{MSC}=\frac{\mathrm{SSC}}{C-1}

the mean square error

MSE=SSENC\mathrm{MSE}=\frac{\mathrm{SSE}}{N-C}

ratio of the treatment variance to the error variance

F=MSCMSE\mathrm{F}=\frac{\mathrm{MSC}}{\mathrm{MSE}}

df

(df)C=C1(df)E=NC(df)T=N1\begin{aligned} & (\mathrm{df})_C=C-1 \\ & (\mathrm{df})_E=N-C \\ & (\mathrm{df})_T=N-1 \end{aligned}

Step 1.

H0:μ1=μ2==μkH_0: \mu_1=\mu_2=\cdots=\mu_k
Ha:H_a: At least one mean is different from the others

Step 2.

SSC=j=1Cnj(xˉjxˉ)2\mathrm{SSC}=\sum_{j=1}^C n_j\left(\bar{x}_j-\bar{x}\right)^2
SSE=i=1njj=1C(xijxˉj)2\mathrm{SSE}=\sum_{i=1}^{n_j} \sum_{j=1}^C\left(x_{i j}-\bar{x}_j\right)^2
SST=i=1njj=1C(xijxˉ)2\mathrm{SST}=\sum_{i=1}^{n_j} \sum_{j=1}^C\left(x_{i j}-\bar{x}\right)^2
 SSC =j=1Cnj(xˉjxˉ)2=[5(6.3186.339583)2+8(6.27756.339583)2+7(6.4885716.339583)2+4(6.2306.339583)2]=0.00233+0.03083+0.15538+0.04803=0.23658 SSE =i=1njj=1C(xijxˉj)2=[(6.336.318)2+(6.266.318)2+(6.316.318)2+(6.296.318)2+(6.406.318)2+(6.266.2775)2+(6.366.2775)2++(6.196.230)2+(6.216.230)2=0.15492 SST =i=1njj=1C(xijxˉ)2=[(6.336.339583)2+(6.266.339583)2+(6.316.339583)2++(6.196.339583)2+(6.216.339583)2=0.39150\begin{aligned} \text { SSC }=\sum_{j=1}^C n_j\left(\bar{x}_j-\bar{x}\right)^2= & {\left[5(6.318-6.339583)^2+8(6.2775-6.339583)^2\right.} \\ & \left.+7(6.488571-6.339583)^2+4(6.230-6.339583)^2\right] \\ = & 0.00233+0.03083+0.15538+0.04803 \\ = & 0.23658 \\ \text { SSE }=\sum_{i=1}^{n_j} \sum_{j=1}^C\left(x_{i j}-\bar{x}_j\right)^2= & {\left[(6.33-6.318)^2+(6.26-6.318)^2+(6.31-6.318)^2\right.} \\ & +(6.29-6.318)^2+(6.40-6.318)^2+(6.26-6.2775)^2 \\ & +(6.36-6.2775)^2+\ldots+(6.19-6.230)^2+(6.21-6.230)^2 \\ = & 0.15492 \\ \text { SST }=\sum_{i=1}^{n_j} \sum_{j=1}^C\left(x_{i j}-\bar{x}\right)^2= & {\left[(6.33-6.339583)^2+(6.26-6.339583)^2\right.} \\ & +(6.31-6.339583)^2+\ldots+(6.19-6.339583)^2 \\ & +(6.21-6.339583)^2 \\ = & 0.39150 \end{aligned}

Step 3.

dfC=C1=41=3dfE=NC=244=20dfT=N1=241=23MSC=SSCdfC=.236583=.078860MSE=SSEdfE=.1549220=.007746F=.078860.007746=10.18\begin{aligned} \mathrm{df}_C & =C-1=4-1=3 \\ \mathrm{df}_E & =N-C=24-4=20 \\ \mathrm{df}_T & =N-1=24-1=23 \\ \mathrm{MSC} & =\frac{\mathrm{SSC}}{\mathrm{df}_C}=\frac{.23658}{3}=.078860 \\ \mathrm{MSE} & =\frac{\mathrm{SSE}}{\mathrm{df}_E}=\frac{.15492}{20}=.007746 \\ F & =\frac{.078860}{.007746}=10.18 \end{aligned}

Step 4. ANOVA Table

 Source of Variance  df  SS  MS  F  Between 30.236580.07886010.18 Error 200.154920.007746 Total 230.39150\begin{array}{lrccc} \text { Source of Variance } & \text { df } & \text { SS } & \text { MS } & \text { F } \\ \hline \text { Between } & 3 & 0.23658 & 0.078860 & 10.18 \\ \text { Error } & 20 & 0.15492 & 0.007746 & \\ \text { Total } & 23 & 0.39150 & & \end{array}

Step 5.

  • The observed F value of 10.187 is larger than the critical F value of 3.10(F0.05,3,20F_{0.05, 3, 20})
  • H0H_0 is rejected
  • The result indicates that not all means are equal, and there is a significant difference in the mean valve openings by machine operator

Multiple Comparison Tests

일원분산분석 결과 귀무가설이 기각되어 모집단의 평균 중에는 차이가 존재한다고 결론을 내리게 되면, 그 차이를 보이는 모집단이 어떤 것들인지에 대한 분석이 추가적으로 필요하다. 이를 다중비교라고 한다.

determine from the data which pairs of means are significantly different.
모집단의 평균들을 2개씩 짝을 지어 평균의 차에 대한 신뢰구간을 구해보는 방법이 있음

Tukey's Honestly Significant Difference (HSD) test

모든 가능한 조합의 평균 차이에 대한 신뢰구간을 고려한다. 만일 5개의 모집단을 다중비교하려면 5C2 = 10개 비교를 하면 된다.

  • Tukey’s HSD test requires equal sample sizes for all treatments
  • Tukey’s HSD test uses the studentized range distribution

Studentized range(q 분포) is the difference between the largest and smallest data in a sample, normalized by the sample standard deviation

Step 1.

compute the critical value qα,C,NCq_{\alpha, C, N-C}

Pr(Xqα,C,NC)=α\operatorname{Pr}\left(X \geq q_{\alpha, C, N-C}\right)=\alpha

C groups and N-C degrees of freedom

qC,NC=yˉmaxyˉminsp/nq_{C, N-C}=\frac{\bar{y}_{\max }-\bar{y}_{\min }}{s_p / \sqrt{n}}

여기서

s12=1n11i=1n1(x1,ixˉ1)2s22=1n21i=1n2(x2,ixˉ2)2\begin{aligned} & s_1^2=\frac{1}{n_1-1} \sum_{i=1}^{n_1}\left(x_{1, i}-\bar{x}_1\right)^2 \\ & s_2^2=\frac{1}{n_2-1} \sum_{i=1}^{n_2}\left(x_{2, i}-\bar{x}_2\right)^2 \end{aligned}
sp2=(n11)s12+(n21)s22n1+n22s_p^2=\frac{\left(n_1-1\right) s_1^2+\left(n_2-1\right) s_2^2}{n_1+n_2-2}

Step 2.

compute the observed value

qs(i,j)=xˉixˉjMSE/nq_s(i, j)=\frac{\left|\bar{x}_i-\bar{x}_j\right|}{\sqrt{\mathrm{MSE} / n}}

Step 3.

compare qsq_s and qα,C,NCq_{\alpha, C, N-C}

If qS(i,j)>qα,C,NCq_S(i, j)>q_{\alpha, C, N-C}, the mean between the group i and j are significantly different

Step 4.

Test for all pairs (i,j) of treatments

Tukey-Kramer Procedure

When the sample sizes are unequal

Step 1. same

compute the critical value qα,C,NCq_{\alpha, C, N-C}

Pr(Xqα,C,NC)=α\operatorname{Pr}\left(X \geq q_{\alpha, C, N-C}\right)=\alpha

Step 2.

compare xˉixˉj\left|\bar{x}_i-\bar{x}_j\right| and qα,C,NCMSE2(1ni+1nj)q_{\alpha, C, N-C} \sqrt{\frac{\operatorname{MSE}}{2}\left(\frac{1}{n_i}+\frac{1}{n_j}\right)}

If xˉixˉj\left|\bar{x}_i-\bar{x}_j\right| is larger, the mean between the group i and j are significantly different

Step 3.

Test for all pairs (i,j) of treatments

0개의 댓글

관련 채용 정보