Z-Test vs. T-Test, P-Value

been_29·2024년 7월 29일

Statistics python

한국경제신문 with Toss bank MLOps 과정

목록 보기

7/26

💡 Z-Test and T-Test

They are statistical methods used to compare sample data to a population mean or to compare two samples.
The choice between a z-test and a t-test depends on the sample size and whether the population standard deviation is know.

Z-Test

Used when the population variance (or standard deviation) is known, and the sample size is large (typically $n$ >30).

Steps for Z-Test

Formulate Hypotheses
- Null Hypothesis ( $H_0$ ): The sample mean is eqaul to the population mean.
- Alternative Hypothesis ( $H_1$ ): The sample mean is not equal to the population mean.
Calculate the Z-Statistic
- where $\bar{x}$ is the sample mean, $\mu$ is the population mean, $\sigma$ is the population standard deviation, and $n$ is the sample size.

\\z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}}

Determine the Critical Value
- Use the standard normal distribution (Z-distribution) to find the critical value corresponding to the significance level ( $\alpha$ ).
Make a Decision
- If the z-statistic falls into the rejection region, reject the null hypothesis.

Example Code

import numpy as np
from scipy.stats import norm

# Example data
sample_data = [78, 82, 79, 83, 76, 77, 85, 88, 75, 74, 80, 79]
population_mean = 75
population_std = 5

# Calculate sample mean
sample_mean = np.mean(sample_data)
n = len(sample_data)

# Calculate the z-statistic
z_statistic = (sample_mean - population_mean) / (population_std / np.sqrt(n))

# Significance level
alpha = 0.05

# Determine the critical value for a two-tailed test
z_critical = norm.ppf(1 - alpha/2)

# Make a decision
if abs(z_statistic) > z_critical:
    print(f"z-statistic: {z_statistic}, Critical value: {z_critical}")
    print("Reject the null hypothesis.")
else:
    print(f"z-statistic: {z_statistic}, Critical value: {z_critical}")
    print("Fail to reject the null hypothesis.")

T-Test

Used when the population variance is unknown and the sample size is small ( $n<30$ ). It is also used when comparing the means of two samples.

Steps for T-Test

Formulate Hypotheses
- Null Hypothesis ( $H_0$ ): The sample mean is eqaul to the population mean (one-sample t-test) or the means of the two samples are equal (two-sample t-test).
- Alternative Hypothesis ( $H_1$ ): The sample mean is not equal to the population mean or the means of the two samples are not eqaul.
Calculate the T-Statistic
- One-sample t-test

\\t = \frac{\bar{x} - \mu}{s/\sqrt{n}}

Determine the Critical Value
- Use the t-distribution with $n-1$ degrees of freedom to find the critical value corresponding to the significance level ( $\alpha$ ).
Make a Decision
- If the t-statistic falls into the rejection region, reject the null hypothesis.

Example Code

import numpy as np
from scipy.stats import t

# Example data
sample_data = [78, 82, 79, 83, 76, 77, 85, 88, 75, 74, 80, 79]
population_mean = 75

# Calculate sample mean and sample standard deviation
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)
n = len(sample_data)

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean) / (sample_std / np.sqrt(n))

# Significance level
alpha = 0.05
df = n - 1

# Determine the critical value for a two-tailed test
t_critical = t.ppf(1 - alpha/2, df)

# Make a decision
if abs(t_statistic) > t_critical:
    print(f"t-statistic: {t_statistic}, Critical value: {t_critical}")
    print("Reject the null hypothesis.")
else:
    print(f"t-statistic: {t_statistic}, Critical value: {t_critical}")
    print("Fail to reject the null hypothesis.")

💡 P-Value

What is P-Value

Definition
- A measure used in statistical hypothesis testing toe determine the significance of the observed data.
- Represent the probability of obtaining test results at least as extreme as the observed results, assuming that the null hypothesis is true.
Hypothesis Testing
- Null Hypothesis ( $H_0$ ) : The hypothesis that there is no effect or no difference. It is the default or starting assumption.
- Alternative Hypothesis ( $H_1$ ) : The hypothesis that there is an effect or a difference. It is what you aim to support.
Interpretation of P-Value
- Low p-value ( $<=\alpha$ ): Indicate that the observed data are unlikely under the null hypothesis. This leads to rejecting the null hypothesis.
- High p-value ( $>\alpha$ ): Indicate that the observed data are likely under the null hypothesis. This leads to failing to rejecting null hypothesis.

Steps in Hypothesis Testing Using P-Value

State the Hypotheses
- $H_0$ : Null hypothesis.
- $H_1$ : Alternative hypothesis.
Choose a Significance Level ( $\alpha$ )
- Common choices are 0.05, 0.01, etc.
Calculate the Test Statistic
- Depending on the test (t-test, z-test, etc.), calculate the corresponding test statistic.
Determine the P-Value:
- Find the p-value associated with the test statistic.
Make a Decisdion
- Compare the p-value to $\alpha$ .
- If p-value $<= \alpha$ , reject $H_0$ .
- If p-value $> \alpha$ , fail to reject $H_0$ .

Example Code

import numpy as np
from scipy import stats

# Example data for two groups
group1 = [78, 82, 79, 83, 76, 77, 85, 88, 75, 74, 80, 79]
group2 = [68, 72, 69, 73, 66, 67, 75, 78, 65, 64, 70, 69]

# Perform t-test
t_statistic, p_value = stats.ttest_ind(group1, group2)

print(f"t-statistic: {t_statistic}")
print(f"p-value: {p_value}")

# Decision based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")