[speaker verification] basics

Willow·2024년 1월 16일
0

SPEECH PROCESSING

목록 보기
7/13
  • high performance necessary under 'real world' conditions
  • difficulties
    • intrinsic: age, emotion, manner of speaking
    • extrinsic: background noise, reverberation, channel/mic
  • speaker identification: mapping a given utterance to a speaker (open set vs. closed set)
    • makes "closed set" a "multi-class classification"
    • classification loss
  • speaker verification: mapping a given utterance to a target model
    • contrastive loss (learn the embedding, rather than computing distance, e.g. Siamese)
    • a portion of data (=test set) left for unseen POIs
  • practice
    1. model
    • average pooling
    • fixed input length
    1. metric
    • EER
  • terms:
    1. POI: person of interest
profile
Speech Processing/AI/Linguistics/CS/etc.

0개의 댓글