통계방법론 W6

ese2o·2024년 6월 15일

Analysis of Variance and Design of Experiments

Design of Experiments

Experimental Design(실험계획): a plan and a structure to test hypotheses in which the experimenter either controls or manipulates one or more variables

ANOVA

Analysis of Variance (분산분석)
dependent variable responses (measurements, data) are not all the same in a given study

ANOVA의 세 가지 종류
1. Completely randomized design (One-way ANOVA)
2. Randomized block design
3. Factorial experiments (Two-way ANOVA)

The Completely Randomized Design (CRD)

독립변수 하나
그 독립변수가 2 이상의 treatment level(or classification) 가짐
- 만약 treatment level이 2개라면, 이전처럼 t test 사용.

One-way ANOVA

: A hypothesis testing technique that is used to compare the
means of three or more populations when there is only one independent variable
One-way ANOVA analyzes all the sample means at one time and thus precludes the buildup of error rate.

SSE(Error Sum of Squares): The error variance, or that portion of the total variance unexplained by the treatment
SSC(Treatment Sum of Squares): The variance resulting from the treatment(columns)
SST(Total Sum of Squares): SST = SSC + SSE

\sum_{j=1}^C \sum_{i=1}^{n_j}\left(x_{i j}-\bar{x}\right)^2=\sum_{j=1}^C n_j\left(\bar{x}_j-\bar{x}\right)^2+\sum_{j=1}^C \sum_{i=1}^{n_j}\left(x_{i j}-\bar{x}_j\right)^2

$C$ : number of treatment levels
$j$ : index for each treatment level
$n_j$ : number of observations in a given treatment level
$i$ : index for each member of a treatment level
$\bar x$ : total mean
$\bar x_j$ : mean of a treatment group or level
$x_{ij}$ : individual value

the mean square of columns

\mathrm{MSC}=\frac{\mathrm{SSC}}{C-1}

the mean square error

\mathrm{MSE}=\frac{\mathrm{SSE}}{N-C}

ratio of the treatment variance to the error variance

\mathrm{F}=\frac{\mathrm{MSC}}{\mathrm{MSE}}

df

\begin{aligned} & (\mathrm{df})_C=C-1 \\ & (\mathrm{df})_E=N-C \\ & (\mathrm{df})_T=N-1 \end{aligned}

Step 1.

$H_0: \mu_1=\mu_2=\cdots=\mu_k$
$H_a:$ At least one mean is different from the others

Step 2.

\mathrm{SSC}=\sum_{j=1}^C n_j\left(\bar{x}_j-\bar{x}\right)^2

\mathrm{SSE}=\sum_{i=1}^{n_j} \sum_{j=1}^C\left(x_{i j}-\bar{x}_j\right)^2

\mathrm{SST}=\sum_{i=1}^{n_j} \sum_{j=1}^C\left(x_{i j}-\bar{x}\right)^2

\begin{aligned} \text { SSC }=\sum_{j=1}^C n_j\left(\bar{x}_j-\bar{x}\right)^2= & {\left[5(6.318-6.339583)^2+8(6.2775-6.339583)^2\right.} \\ & \left.+7(6.488571-6.339583)^2+4(6.230-6.339583)^2\right] \\ = & 0.00233+0.03083+0.15538+0.04803 \\ = & 0.23658 \\ \text { SSE }=\sum_{i=1}^{n_j} \sum_{j=1}^C\left(x_{i j}-\bar{x}_j\right)^2= & {\left[(6.33-6.318)^2+(6.26-6.318)^2+(6.31-6.318)^2\right.} \\ & +(6.29-6.318)^2+(6.40-6.318)^2+(6.26-6.2775)^2 \\ & +(6.36-6.2775)^2+\ldots+(6.19-6.230)^2+(6.21-6.230)^2 \\ = & 0.15492 \\ \text { SST }=\sum_{i=1}^{n_j} \sum_{j=1}^C\left(x_{i j}-\bar{x}\right)^2= & {\left[(6.33-6.339583)^2+(6.26-6.339583)^2\right.} \\ & +(6.31-6.339583)^2+\ldots+(6.19-6.339583)^2 \\ & +(6.21-6.339583)^2 \\ = & 0.39150 \end{aligned}

Step 3.

\begin{aligned} \mathrm{df}_C & =C-1=4-1=3 \\ \mathrm{df}_E & =N-C=24-4=20 \\ \mathrm{df}_T & =N-1=24-1=23 \\ \mathrm{MSC} & =\frac{\mathrm{SSC}}{\mathrm{df}_C}=\frac{.23658}{3}=.078860 \\ \mathrm{MSE} & =\frac{\mathrm{SSE}}{\mathrm{df}_E}=\frac{.15492}{20}=.007746 \\ F & =\frac{.078860}{.007746}=10.18 \end{aligned}

Step 4. ANOVA Table

\begin{array}{lrccc} \text { Source of Variance } & \text { df } & \text { SS } & \text { MS } & \text { F } \\ \hline \text { Between } & 3 & 0.23658 & 0.078860 & 10.18 \\ \text { Error } & 20 & 0.15492 & 0.007746 & \\ \text { Total } & 23 & 0.39150 & & \end{array}

Step 5.

The observed F value of 10.187 is larger than the critical F value of 3.10( $F_{0.05, 3, 20}$ )
$H_0$ is rejected
The result indicates that not all means are equal, and there is a significant difference in the mean valve openings by machine operator

Multiple Comparison Tests

일원분산분석 결과 귀무가설이 기각되어 모집단의 평균 중에는 차이가 존재한다고 결론을 내리게 되면, 그 차이를 보이는 모집단이 어떤 것들인지에 대한 분석이 추가적으로 필요하다. 이를 다중비교라고 한다.

determine from the data which pairs of means are significantly different.
모집단의 평균들을 2개씩 짝을 지어 평균의 차에 대한 신뢰구간을 구해보는 방법이 있음

Tukey's Honestly Significant Difference (HSD) test

모든 가능한 조합의 평균 차이에 대한 신뢰구간을 고려한다. 만일 5개의 모집단을 다중비교하려면 5C2 = 10개 비교를 하면 된다.

Tukey’s HSD test requires equal sample sizes for all treatments
Tukey’s HSD test uses the studentized range distribution

Studentized range(q 분포) is the difference between the largest and smallest data in a sample, normalized by the sample standard deviation

Step 1.

compute the critical value $q_{\alpha, C, N-C}$

$\operatorname{Pr}\left(X \geq q_{\alpha, C, N-C}\right)=\alpha$

C groups and N-C degrees of freedom

q_{C, N-C}=\frac{\bar{y}_{\max }-\bar{y}_{\min }}{s_p / \sqrt{n}}

여기서

\begin{aligned} & s_1^2=\frac{1}{n_1-1} \sum_{i=1}^{n_1}\left(x_{1, i}-\bar{x}_1\right)^2 \\ & s_2^2=\frac{1}{n_2-1} \sum_{i=1}^{n_2}\left(x_{2, i}-\bar{x}_2\right)^2 \end{aligned}

s_p^2=\frac{\left(n_1-1\right) s_1^2+\left(n_2-1\right) s_2^2}{n_1+n_2-2}

Step 2.

compute the observed value

q_s(i, j)=\frac{\left|\bar{x}_i-\bar{x}_j\right|}{\sqrt{\mathrm{MSE} / n}}

Step 3.

compare $q_s$ and $q_{\alpha, C, N-C}$

If $q_S(i, j)>q_{\alpha, C, N-C}$ , the mean between the group i and j are significantly different

Step 4.

Test for all pairs (i,j) of treatments

Tukey-Kramer Procedure

When the sample sizes are unequal

Step 1. same

compute the critical value $q_{\alpha, C, N-C}$

$\operatorname{Pr}\left(X \geq q_{\alpha, C, N-C}\right)=\alpha$

Step 2.

compare $\left|\bar{x}_i-\bar{x}_j\right|$ and $q_{\alpha, C, N-C} \sqrt{\frac{\operatorname{MSE}}{2}\left(\frac{1}{n_i}+\frac{1}{n_j}\right)}$

If $\left|\bar{x}_i-\bar{x}_j\right|$ is larger, the mean between the group i and j are significantly different

Step 3.

Test for all pairs (i,j) of treatments

ese2o

이전 포스트

Diffusion

다음 포스트

통계방법론 W6

Analysis of Variance and Design of Experiments

Design of Experiments

ANOVA

The Completely Randomized Design (CRD)

One-way ANOVA

the mean square of columns

the mean square error

ratio of the treatment variance to the error variance

df

Step 1.

Step 2.

Step 3.

Step 4. ANOVA Table

Step 5.

Multiple Comparison Tests

Tukey's Honestly Significant Difference (HSD) test

Step 1.

Step 2.

Step 3.

Step 4.

Tukey-Kramer Procedure

When the sample sizes are unequal

Step 1. same

Step 2.

Step 3.

Diffusion

통계방법론 W7

0개의 댓글

관련 채용 정보