Standard Deviation vs. Standard Error, Hypothesis testing

been_29Β·2024λ…„ 7μ›” 29일
post-thumbnail

πŸ’‘ The difference between Standard Deviation and Standard Error


Standard Deviation

  • Definition
    • A measure of the amount of variation or dispersion in a set of values.
    • Quantify how much the values in a data set differ from the mean(average) of the data set.
  • Calculation
    • Variance (Οƒ2\sigma^2) is the average of the squared differences from the mean.
    • Standard deviation (Οƒ\sigma) is the square root of the variance.
Οƒ=1Nβˆ‘i=1N(xiβˆ’ΞΌ)2\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2}
  • Interpretation
    • A high standard deviation indicates that the data points are spread out over a wide range of values.
    • A low standard deviation indicates that they are closer to the mean.
  • Example Code
    import numpy as np
    
    # Example data
    data = [70, 75, 80, 85, 90]
    
    # Calculate standard deviation
    standard_deviation = np.std(data)
    print(f"Standard Deviation: {standard_deviation}")

Standard Error

  • Definition

    • An estimate of the standard deviation of the sampling distribution of that statistic.
    • Measure how much the sample mean (xΛ‰\bar{x}) is expected to vary from the true population mean (ΞΌ\mu).
  • Calculation
    - Standard error SESE is the sample standard deviation s divided by the square root of the sample size nn.
    - Where ss is the sample standard deviation and nn is the sample size.

    SE=sN\\SE = \frac{s}{\sqrt{N}}
  • Interpretation

    • Indicate reliability of the sample mean as an estimate of the population mean.
    • A smaller standard error indicates a more precise estimate of the population mean.
  • Example Code

    import numpy as np
    
    # Example data (sample)
    sample_data = [70, 75, 80, 85, 90]
    
    # Calculate sample standard deviation
    sample_standard_deviation = np.std(sample_data, ddof=1)
    
    # Sample size
    n = len(sample_data)
    
    # Calculate standard error
    standard_error = sample_standard_deviation / np.sqrt(n)
    print(f"Standard Error: {standard_error}")

Difference between Standard Deviation and Standard Error

PurposeCalculationUsageDependence on Sample Size
Standard DeviationMeasure the spread of data points around the mean in a single sample.Calculated directly from the data points.Used to describe the variability within a dataset.Independent of sample size.
Standard ErrorMeasure how much the sample mean is expected to vary from the true population mean.Derived from the standard deviation of the sample and the sample size.Used to describe the accuracy of a sample mean as an estimate of the population mean.Decreases as the sample size increases (larger sample sizes provide more accurate estimates of the population mean).








πŸ’‘ Hypothesis Testing

A statistical method used to make decisions about a population parameter based on sample data.

Steps for Hypothesis Testing

  1. Formulate Hypotheses
    • Null Hypothesis (H0H_0) : A statement that there is no effect or no difference.
    • Alternative Hypothesis (H1H_1) : A statement that contradicts the null hypothesis.
  2. Set Significance Level (Ξ±\alpha)
    • The probability of rejecting the null hypothesis when it is actually true (Type I error). Common values are 0.05(5%) or 0.01(1%).
  3. Calculate Test Statistic
    • Based on the sample data, compute a test statistic (e.g., tt-statistic, zz-statistics) that measures how far sample mean deviates from the null hypothesis mean.
  4. Determine the Rejection Region
    • Use the significance level and the distribution of the test statistics to find the critical value(s). The rejection region is the range of values for which the null hypothesis is rejected.
  5. Make a Decision
    • If the test statistic falls into the rejection region, reject the null hypothesis. Otherwise, do not reject the null hypothesis.
  6. Draw a Conclusion
    • Based on the decision, conclude whether there is enough evidence to support the alternative hypothesis.

Example Code

import numpy as np
from scipy import stats

# Example data (students' scores)
data = [78, 82, 79, 83, 76, 77, 85, 88, 75, 74, 80, 79]

# Calculate sample mean and sample standard deviation
sample_mean = np.mean(data)
sample_std = np.std(data, ddof=1)
n = len(data)

# Null hypothesis mean
mu_0 = 75

# Calculate t-statistic
t_statistic = (sample_mean - mu_0) / (sample_std / np.sqrt(n))

# Significance level and degrees of freedom
alpha = 0.05
df = n - 1

# Determine the critical value for a two-tailed test
t_critical = stats.t.ppf(1 - alpha/2, df)

# Make a decision
if abs(t_statistic) > t_critical:
    print(f"t-statistic: {t_statistic}, Critical value: {t_critical}")
    print("Reject the null hypothesis.")
else:
    print(f"t-statistic: {t_statistic}, Critical value: {t_critical}")
    print("Fail to reject the null hypothesis.")
profile
Data Analysis

0개의 λŒ“κΈ€