If you do not recieve your confirmation email within a few hours, please email haloutau@gmail.com with your username for manual validation. Your account should be activated within 24 hours.
You may also reach out via any other listed contact on Admin Halo's about page: https://utaforum.net/members/halo.194/#about
You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
I mean like similar to Talqu but for English or something that allows you to make your own text to speech box? Bonus points for any similarities to UTAU or other voice snyths where you can share your vb. I highly doubt theirs one like this out there, but I'm highly interested to see! T_T
Tacotron 2 (what talqu is based on) Can actually be pretty good for english tts, though it requires some coding.
There is a tutorial though, This is the video on how to set up the data
You can download the voice file once its trained and rename it to have the extension .pt and put it into talqu, but you would need the pro version of talqu because an english tts for some reason needs the pronunciation editor box thing (The third box below the other input boxes) otherwise it would only accept japanese text and would sound very cursed.
Tacotron 2 (what talqu is based on) Can actually be pretty good for english tts, though it requires some coding.
There is a tutorial though, This is the video on how to set up the data
You can download the voice file once its trained and rename it to have the extension .pt and put it into talqu, but you would need the pro version of talqu because an english tts for some reason needs the pronunciation editor box thing (The third box below the other input boxes) otherwise it would only accept japanese text and would sound very cursed.
Tacotron 2 (what talqu is based on) Can actually be pretty good for english tts, though it requires some coding.
There is a tutorial though, This is the video on how to set up the data
You can download the voice file once its trained and rename it to have the extension .pt and put it into talqu, but you would need the pro version of talqu because an english tts for some reason needs the pronunciation editor box thing (The third box below the other input boxes) otherwise it would only accept japanese text and would sound very cursed.
I spent a rushed couple of hours on it (about 40 min of training the model and 45 short recordings of myself, total file duration less than 5 min), and the results were far from satisfactory.
I ran into problems with TensorFlow (the error message mentioned 'tf.contrib' not supported by TF 2.0+ despite there being a line to use TF 1.x), but all my trying to add code from StackExchange answers didn't resolve it. What worked for me was restarting the runtime a few times. Literally turn it off and on again.
The generated files were either noise or stuttering. The output is as good as the input, so I advise anyone who wants to try to use much more than 5 min of data.
I ran it again for about 1.5 hours, this time with under 9 minutes of recordings. (This experiment was sponsored by both procrastination and impatience.)
Still gives static/noise, but now it can actually do words.
Interestingly, it can output some phrases present in my recordings, but not others. I used some sentences from http://festvox.org/cmu_faf/index.html, several from Project Gutenberg's Sherlock Holmes, and a couple from the English Wikipedia page for UTAU.
As for anyone reading this wondering about distribution, part 2 of the YouTube guide (the training one) shows the model must be publicly accessible on Google Drive to be used in a Colab synthesis notebook (the place where you input the text to generate speech). So, theoretically, someone could share their voicebank like this.
Tacotron 2 (what talqu is based on) Can actually be pretty good for english tts, though it requires some coding.
There is a tutorial though, This is the video on how to set up the data
You can download the voice file once its trained and rename it to have the extension .pt and put it into talqu, but you would need the pro version of talqu because an english tts for some reason needs the pronunciation editor box thing (The third box below the other input boxes) otherwise it would only accept japanese text and would sound very cursed.
Thank you so much!
I didn't know there is text-to-speech system that is this easy. I tried to train the NNSVS but didn't get usable result. And it's very limited to Japanese.
I spent a rushed couple of hours on it (about 40 min of training the model and 45 short recordings of myself, total file duration less than 5 min), and the results were far from satisfactory.
I ran into problems with TensorFlow (the error message mentioned 'tf.contrib' not supported by TF 2.0+ despite there being a line to use TF 1.x), but all my trying to add code from StackExchange answers didn't resolve it. What worked for me was restarting the runtime a few times. Literally turn it off and on again.
The generated files were either noise or stuttering. The output is as good as the input, so I advise anyone who wants to try to use much more than 5 min of data.
Hey guys, as a Tacotron2 expert (and also part of the Uberduck contributors...). In order to create a really decent voice that covers most phonemes, I suggest doing at least about at least 30 minutes of speech data (total wavs duration), the best will be about 1-2 hours of speech data (the more the better!)... So I would say roughly about 1000-5000 wav files of sentences and/or more to be great enough for training
Hey guys, as a Tacotron2 expert (and also part of the Uberduck contributors...). In order to create a really decent voice that covers most phonemes, I suggest doing at least about at least 30 minutes of speech data (total wavs duration), the best will be about 1-2 hours of speech data (the more the better!)... So I would say roughly about 1000-5000 wav files of sentences and/or more to be great enough for training
I HAVE FOUND THE SOLUTION! I was browsing Anna Nyui's website and I discovered she did something called. Coefront? From what I can find, it's a do it yourself and download other peoples sometimes even comerical, Text to speech thing.
It's called COEFRONT! Right now I think it only does Japanese? But I foudn it through Anna Nyui. It only allows your account though so you can't share accounts or voicebanks that aren't yours. Even if you own it. Thoughts on this?
You don't really have to use Arpabet for training (which means you can just use normal sentences/alphabets, though Arpabet conversion is optional, but for that, it's best to use Tacotron2 forks that has Arpabet conversion if you wanted to use Arpabet for a slightly more accurate prononciation). Also, you don't really use "_" symbols other than filenames, and "-" can be used in your training transcript as it's own normal usage (for example words like "air-space", or "merry-go-round")
You don't really have to use Arpabet for training (which means you can just use normal sentences/alphabets, though Arpabet conversion is optional, but for that, it's best to use Tacotron2 forks that has Arpabet conversion if you wanted to use Arpabet for a slightly more accurate prononciation). Also, you don't really use "_" symbols other than filenames, and "-" can be used in your training transcript as it's own normal usage (for example words like "air-space", or "merry-go-round")