Choppy generation using pre-trained tacotron-gst model checkpoint #536

astricks · 2020-05-07T08:22:53Z

Hi,

I am using the tacotron-gst for speech generation (mag) and getting choppy generated audio, as someone else noted here. My inference output files are here.

I'm running inference on an NVIDIA tf docker container. Here are my inference logs.

The text I am trying to generate is from the M-AILABS dataset itself. My inference file contains the one line below:

en_US/by_book/female/judy_bieber/the_master_key/wavs/the_master_key_10_f000002|UNUSED|How Rob Served a Mighty King.

If I understand correctly, the provided checkpoint has been trained on the M-AILABS dataset, which means it has seen this particular sentence/audio pair.

Is sample_step0_0_infer_mag.wav the quality to be expected?
Can I swap out griffin-lim and use wavenet to improve the audio quality?
Can you please share some Tacotron-GST audio samples (I found the non-GST tacotron samples in the docs) you have generated, so that we can know what to expect? My expectations are set by the Google tacotron team's audio samples on their webpage.
In short - Is there any way to tell (from the output spectrogram image perhaps) what is causing the low quality generation, and what to change to improve quality? The model, or the vocoder? Both?

astricks · 2020-05-26T07:23:35Z

I'd really appreciate any advice on this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choppy generation using pre-trained tacotron-gst model checkpoint #536

Choppy generation using pre-trained tacotron-gst model checkpoint #536

astricks commented May 7, 2020 •

edited

Loading

astricks commented May 26, 2020

Choppy generation using pre-trained tacotron-gst model checkpoint #536

Choppy generation using pre-trained tacotron-gst model checkpoint #536

Comments

astricks commented May 7, 2020 • edited Loading

astricks commented May 26, 2020

astricks commented May 7, 2020 •

edited

Loading