Paper review: <Sampling Frequency Independent Dialogue Separation>

이용준·2025년 9월 18일
0

Paper Review

목록 보기
15/15

Summary

  • A model trained with a single sampling rate (sampling frequency) can be transferred to a model with different sampling rates.

Methods

Sampling frequency independence (SFI) can be achieved when two conditions are met.

  • For a given piece of waveform, if the window length and the hop size are set in terms of a constant unit of seconds not a constant unit of sample size.

Example) A : Audio sample with 8 kHz sampling frequency.
B : Audio sample with 16 kHz sampling frequency.

Set window size = 20 ms, hop size = 4 ms. Then adjust the actual sample sizes with regards to the sampling frequency.

  • Channel size-independent operations after STFT.

The number of time steps remain the same, but the number of frequency bins are different after SFI operations. Thus, a network such as FC layer cannot be applied in a simple manner since frequency dimension(=channel) is not the same for data. So, a type of operation such as 1D Conv or pooling along the frequency axis should first be applied.

Results

  • In the upper table, CNN 8 kHz means a CNN model trained in 8 kHz sampling rate. CNN 48 kHz is a CNN model trained in 48 kHz sampling rate. Now, CNN 8->48kHz means a CNN trained in 8 kHz are only inferenced in 48 kHz. Surprisingly, the performance is better!

Discussions

  • It can be applied to adapting models trained in a certain frequency-range to meet other frequency's needs.
profile
Ad libitum

0개의 댓글