Describing Data

yozzum·2025년 1월 28일
0

Statistics

목록 보기
2/29

[Graphically]
1. Frequency distribution(table)
2. Bar chart(discrete)
3. Histogram(continuous)
4. Frequency polygon
5. Scatter(Dot) plot
6. Pareto chart

Pareto chart

  • Pareto Principle = "80-20 rule"
    • The majority of "result" is caused by small portion of the "factors"
      • eg., 80% of crimes are committed by 20% of criminals

Determination of class interval

  • 2^k rule (recommended) : 2^k > number of observations
  • interval i >= (H-L)/k

[Numerically]

  • Central tendency: Typical value representing the data set
      1. mean 2. median 3. mode
  • Dispersion
      1. variance 2. standard deviation

Chebyshev's Theorem

  • For any set of data, at least 1-(1/k^2) of the data lie with in +-k standard deviation from the mean(where k > 1)

Emperical rule

  • For a bell shaped data mounded, Normal distribution
  • Symmetric

Why dispersion?

1) enables us to compare the spread in two or more distributions.
2) tells the spread of the data

  • A large measure of dispersion indicates the mean is not reliable.
  • A small value for a measure of dispersion indicates data are focused around the mean.

Why standard deviation?

1) Provides meaningful comparison among different sets of data.

  • tells weather data are wide-spread or focused around the mean.

2) Provides measurement of uncertainty.

  • large SD tells there's high degree of uncertainty.
  • small SD tells there's low degree of uncertainty.

3) Provides the population proportion of the data values.

Measures of position

  • Quartiles, Decile, Percentile
  • Location of Percentile = (n+1) x P/100
profile
yozzum

0개의 댓글

관련 채용 정보