[speaker verification] basics

Willow·2024년 1월 16일

SPEECH PROCESSING

목록 보기

7/13

high performance necessary under 'real world' conditions
difficulties
- intrinsic: age, emotion, manner of speaking
- extrinsic: background noise, reverberation, channel/mic
speaker identification: mapping a given utterance to a speaker (open set vs. closed set)
- makes "closed set" a "multi-class classification"
- classification loss
speaker verification: mapping a given utterance to a target model
- contrastive loss (learn the embedding, rather than computing distance, e.g. Siamese)
- a portion of data (=test set) left for unseen POIs
practice
1. model
- average pooling
- fixed input length
1. metric
- EER
terms:
1. POI: person of interest

Speech Processing/AI/Linguistics/CS/etc.