Summary
- A model trained with a single sampling rate (sampling frequency) can be transferred to a model with different sampling rates.
Methods
Sampling frequency independence (SFI) can be achieved when two conditions are met.
- For a given piece of waveform, if the window length and the hop size are set in terms of a constant unit of seconds not a constant unit of sample size.
Example) A : Audio sample with 8 kHz sampling frequency.
B : Audio sample with 16 kHz sampling frequency.
Set window size = 20 ms, hop size = 4 ms. Then adjust the actual sample sizes with regards to the sampling frequency.
- Channel size-independent operations after STFT.
The number of time steps remain the same, but the number of frequency bins are different after SFI operations. Thus, a network such as FC layer cannot be applied in a simple manner since frequency dimension(=channel) is not the same for data. So, a type of operation such as 1D Conv or pooling along the frequency axis should first be applied.
Results

- In the upper table, CNN 8 kHz means a CNN model trained in 8 kHz sampling rate. CNN 48 kHz is a CNN model trained in 48 kHz sampling rate. Now, CNN 8->48kHz means a CNN trained in 8 kHz are only inferenced in 48 kHz. Surprisingly, the performance is better!
Discussions
- It can be applied to adapting models trained in a certain frequency-range to meet other frequency's needs.