[SV] Exploring wav2vec 2.0 on speaker verification and language identification

Willow·2024년 1월 16일

SPEECH PROCESSING

목록 보기

8/13

2020년의 대히트작 wav2vec2.0을 sv, lid에 적용한 논문이다.

모델 구조
- w2v-encoder + avg pooling layer + fc layer (random init)
- cross-entropy loss & cosine distance to compute similarity
  - a classification-style approach
  - MTL of LID and SV
세팅
- VoxCeleb1, train의 일부를 dev로 사용
- 10000 step에서 w2v encoder freeze
결과
- t-sne & fine-tuning 결과, pre-training w/ w2v2 helps SV and LID
- MTL 성능은 STL보다 떨어지지만 적은 파라미터로 경쟁력있는 성능을 나온 것이 의의
- w2v features can distinguish speakers and languages (esp. in the lower layers)
  - 다른 음성처리 분야들과 동일한 결과

Speech Processing/AI/Linguistics/CS/etc.