[Graphically]
1. Frequency distribution(table)
2. Bar chart(discrete)
3. Histogram(continuous)
4. Frequency polygon
5. Scatter(Dot) plot
6. Pareto chart
Pareto chart
- Pareto Principle = "80-20 rule"
- The majority of "result" is caused by small portion of the "factors"
- eg., 80% of crimes are committed by 20% of criminals
Determination of class interval
- 2^k rule (recommended) : 2^k > number of observations
- interval i >= (H-L)/k
[Numerically]
- Central tendency: Typical value representing the data set
- Dispersion
- variance 2. standard deviation
Chebyshev's Theorem
- For any set of data, at least 1-(1/k^2) of the data lie with in +-k standard deviation from the mean(where k > 1)
Emperical rule
- For a bell shaped data mounded, Normal distribution
- Symmetric
Why dispersion?
1) enables us to compare the spread in two or more distributions.
2) tells the spread of the data
- A large measure of dispersion indicates the mean is not reliable.
- A small value for a measure of dispersion indicates data are focused around the mean.
Why standard deviation?
1) Provides meaningful comparison among different sets of data.
- tells weather data are wide-spread or focused around the mean.
2) Provides measurement of uncertainty.
- large SD tells there's high degree of uncertainty.
- small SD tells there's low degree of uncertainty.
3) Provides the population proportion of the data values.
Measures of position
- Quartiles, Decile, Percentile
- Location of Percentile = (n+1) x P/100