Standard Deviation vs. Standard Error, Hypothesis testing

been_29·2024년 7월 29일

한국경제신문 with Toss bank MLOps 과정

목록 보기

6/26

💡 The difference between Standard Deviation and Standard Error

Standard Deviation

Definition
- A measure of the amount of variation or dispersion in a set of values.
- Quantify how much the values in a data set differ from the mean(average) of the data set.
Calculation
- Variance ( $\sigma^2$ ) is the average of the squared differences from the mean.
- Standard deviation ( $\sigma$ ) is the square root of the variance.

\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2}

Interpretation
- A high standard deviation indicates that the data points are spread out over a wide range of values.
- A low standard deviation indicates that they are closer to the mean.

Example Code

import numpy as np

# Example data
data = [70, 75, 80, 85, 90]

# Calculate standard deviation
standard_deviation = np.std(data)
print(f"Standard Deviation: {standard_deviation}")

Standard Error

Definition
- An estimate of the standard deviation of the sampling distribution of that statistic.
- Measure how much the sample mean ( $\bar{x}$ ) is expected to vary from the true population mean ( $\mu$ ).
Calculation
- Standard error $SE$ is the sample standard deviation s divided by the square root of the sample size $n$ .
- Where $s$ is the sample standard deviation and $n$ is the sample size.
$\\SE = \frac{s}{\sqrt{N}}$
Interpretation
- Indicate reliability of the sample mean as an estimate of the population mean.
- A smaller standard error indicates a more precise estimate of the population mean.

Example Code

import numpy as np

# Example data (sample)
sample_data = [70, 75, 80, 85, 90]

# Calculate sample standard deviation
sample_standard_deviation = np.std(sample_data, ddof=1)

# Sample size
n = len(sample_data)

# Calculate standard error
standard_error = sample_standard_deviation / np.sqrt(n)
print(f"Standard Error: {standard_error}")

Difference between Standard Deviation and Standard Error

	Purpose	Calculation	Usage	Dependence on Sample Size
Standard Deviation	Measure the spread of data points around the mean in a single sample.	Calculated directly from the data points.	Used to describe the variability within a dataset.	Independent of sample size.
Standard Error	Measure how much the sample mean is expected to vary from the true population mean.	Derived from the standard deviation of the sample and the sample size.	Used to describe the accuracy of a sample mean as an estimate of the population mean.	Decreases as the sample size increases (larger sample sizes provide more accurate estimates of the population mean).

💡 Hypothesis Testing

A statistical method used to make decisions about a population parameter based on sample data.

Steps for Hypothesis Testing

Formulate Hypotheses
- Null Hypothesis ( $H_0$ ) : A statement that there is no effect or no difference.
- Alternative Hypothesis ( $H_1$ ) : A statement that contradicts the null hypothesis.
Set Significance Level ( $\alpha$ )
- The probability of rejecting the null hypothesis when it is actually true (Type I error). Common values are 0.05(5%) or 0.01(1%).
Calculate Test Statistic
- Based on the sample data, compute a test statistic (e.g., $t$ -statistic, $z$ -statistics) that measures how far sample mean deviates from the null hypothesis mean.
Determine the Rejection Region
- Use the significance level and the distribution of the test statistics to find the critical value(s). The rejection region is the range of values for which the null hypothesis is rejected.
Make a Decision
- If the test statistic falls into the rejection region, reject the null hypothesis. Otherwise, do not reject the null hypothesis.
Draw a Conclusion
- Based on the decision, conclude whether there is enough evidence to support the alternative hypothesis.

Example Code

import numpy as np
from scipy import stats

# Example data (students' scores)
data = [78, 82, 79, 83, 76, 77, 85, 88, 75, 74, 80, 79]

# Calculate sample mean and sample standard deviation
sample_mean = np.mean(data)
sample_std = np.std(data, ddof=1)
n = len(data)

# Null hypothesis mean
mu_0 = 75

# Calculate t-statistic
t_statistic = (sample_mean - mu_0) / (sample_std / np.sqrt(n))

# Significance level and degrees of freedom
alpha = 0.05
df = n - 1

# Determine the critical value for a two-tailed test
t_critical = stats.t.ppf(1 - alpha/2, df)

# Make a decision
if abs(t_statistic) > t_critical:
    print(f"t-statistic: {t_statistic}, Critical value: {t_critical}")
    print("Reject the null hypothesis.")
else:
    print(f"t-statistic: {t_statistic}, Critical value: {t_critical}")
    print("Fail to reject the null hypothesis.")

been_29

Data Analysis

이전 포스트

Matplotlib - fig, ax

다음 포스트

Standard Deviation vs. Standard Error, Hypothesis testing

한국경제신문 with Toss bank MLOps 과정

💡 The difference between Standard Deviation and Standard Error

Standard Deviation

Standard Error

Difference between Standard Deviation and Standard Error

💡 Hypothesis Testing

Steps for Hypothesis Testing

Example Code

Matplotlib - fig, ax

Z-Test vs. T-Test, P-Value

0개의 댓글