Sampling Distribution for Categorical Data
CLT for Proportions
• The sampling distribution of sample proportions is nearly normal, centered at the population proportion, and with a standard error inversely proportional to the sample size.
Conditions for the CLT
1) Independence: Sampled observations must be independent.
2) Sample size/skew: There should be at least 10 successes and 10 failures in the sample.
• The same idea of success and failure condition as in the normal approximation of a binomial distribution holds here as the sample proportion needs to be nearly normally distributed.
• When considering the sampling distribution of sample proportions, we don’t have a requirement of n ≥ 30. To determine if the sample size of categorical data is high enough, we instead check the success-failure condition.
(Example)
Since in this case the observation is a sample proportion, standard deviation of that is going to be measured by the standard error, and that gives us a Z score of 2.36.
- You can do the same using the binomial distribution
What if the success-failure condition is not met?
Shape of the Sampling Distribution
(Back to example)
What would you expect the shape of the sampling distribution of percentages of angiosperms in random samples of 50 plants to look like? (Remember, 90% of all plants species are classified as angiosperms.)
The success-failure condition is not met:
50 x 0.9 = 45 >10 but 50 x 0.1 = 5 < 10Therefore, the CLT doesn’t apply and the sampling distribution is not nearly normal. Since the true population proportion is close to 1, and the center of the sampling distribution will be at the true population proportion, we expect a shorter tail on the right side and longer tail on the left, yielding a left skewed distribution.