Prompt-driven Target Speech Diarization
Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks PaperNotion Link
2023, Disentangling Voice and Content with Self-Supervision [NeurIPS]
DR-DESA: "Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity"
AA+DR+NS: "Adapting Speaker Embeddings for Speaker Diarisation", in Proc. Interspeech, 2021. (Naver)
논문 리뷰
From simulated mixtures to simulated conversations (BUT)
Published on ICASSP 2023, Naver CLOCA
MISP baseline, paper, githubmultimodal inputsuses audio features, lip regions of interest, and i-vector embeddingsI-vectors are the key point to solve
Figure SYSTEM DESCRIPTION Visual front-end modified ResNet18-3D model for processing lip videos They make three changes to the standard Pytorch imp
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction, in Proc. ICLR 2022
IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2022)
2020, Active Speakers in Context" in CVPR
2023, LoCoNet: Long-Short Context Network for Active Speaker Detection, in CVPR