
[More label ambiguity examples]
- Is it a bot or spam account?
- Is it a fradulent transaction?
- Is he/she looking for job?
- Are the two users the same person? (no explicit key)
- What is the input x?
- (Images) Lighting? Contrast? Resolution?
- What features need to be included?
- What is the target label y?
- How can we ensure the labelers give consistent labels?

- Unstructured data
- May or may not have huge collection of unlabeled examples x.
- Humans can label more data.
- Data augmentation maore likely to be helpful.
- Structured data
- it is harder to obtain more data.
- Human labeling may not be possible.
- Small data
- Clean labels are critical.
- can get workers talk to each other.
- Big data
- Emphasis on data process.
[Small data and label consistency]
[Improving label consistency]
[Human level performance(HLP)]
Why measure HLP?
Important to be conservative when comparing model performance with HLP.
Should be able to persuade the business stakeholders for the performance.