Audio samples for "NU-GAN: High resolution neural upsampling with GAN"



Section Ⅰ: Examples for single speaker 22 kHz to 44 kHz upsampling

Here we evaluate our algorithm on a single speaker internal dataset consisting of roughly 20 hours of audio of a female speaker recorded in a professional studio.

Original low resolution (22 kHz) Sinc Interpolation (44 kHz) NU-GAN (44 kHz) Original high resolution (44 kHz)


Section Ⅱ: Examples for multi-speaker 22 kHz to 44 kHz upsampling

Here we use the cleanraw subset of the publicly available DAPS dataset. DAPS dataset consists of 10 male speakers and 10 female speakers with each speaker reading 5 different scripts. We hold out 1 script (script5) for each speaker for testing and train on the rest of the data.

Original low resolution (22 kHz) Sinc Interpolation (44 kHz) NU-GAN (44 kHz) Original high resolution (44 kHz)


Section Ⅲ: End-to-end TTS samples at high resolution of 44 kHz

In this section, we attach end-to-end TTS samples at 22 kHz resolution vs 44 kHz resolution, demonstrating the usefulness of our NU-GAN upsampler in the TTS pipeline

TTS @ 22 kHz TTS @ 44 kHz