• If you do not recieve your confirmation email within a few hours, please email haloutau@gmail.com with your username for manual validation. Your account should be activated within 24 hours.
    You may also reach out via any other listed contact on Admin Halo's about page: https://utaforum.net/members/halo.194/#about

UtaForum should i make a diffsinger?

Suzuki Hoshi

Ruko's Ruffians
Defender of Defoko
it seems INCREDIBLY time consuming. How am I supposed to do THAT MUCH DATA. Like 5-10 minutes'd be fine, but @heynotloid (sorry about tagging you) said they'd do their diffsinger data stuff for 14 HOURS.
Also, I'm not perfect, and I do not know good japanese
 

Kawaiine Is Queen

Momo's Minion
I've seen some fairly good Diffsingers with around 10 minutes of data, and I know there's some way to do XLS so you wouldn't even need to record in Japanese. If you really want Japanese data, and a lot of it, you could record for a voice conversion AI such as RVC or Diff-SVC first and use that to get singing data instead (with the slight loss of a more "personalized" singing style, but being specific about what audio you convert could probably minimize that). I don't know what the quality of Diffsinger models made with voice conversion tends to be, but I know it's a recommended method on the Diffsinger Google doc.

If you use the voice conversion method, there are probably some restrictions and/or possible legal concerns with the audio you're allowed to use, but I'm not sure what they'd be. If you have Discord, I think there's still an official(?) Diffsinger server you could ask in if you can't get an answer.
 

Suzuki Hoshi

Ruko's Ruffians
Defender of Defoko
Thread starter
I've seen some fairly good Diffsingers with around 10 minutes of data, and I know there's some way to do XLS so you wouldn't even need to record in Japanese. If you really want Japanese data, and a lot of it, you could record for a voice conversion AI such as RVC or Diff-SVC first and use that to get singing data instead (with the slight loss of a more "personalized" singing style, but being specific about what audio you convert could probably minimize that). I don't know what the quality of Diffsinger models made with voice conversion tends to be, but I know it's a recommended method on the Diffsinger Google doc.

If you use the voice conversion method, there are probably some restrictions and/or possible legal concerns with the audio you're allowed to use, but I'm not sure what they'd be. If you have Discord, I think there's still an official(?) Diffsinger server you could ask in if you can't get an answer.
I did japanese audio and english audio
 

SaKe

Ruko's Ruffians
Defender of Defoko
it seems INCREDIBLY time consuming. How am I supposed to do THAT MUCH DATA. Like 5-10 minutes'd be fine, but @heynotloid (sorry about tagging you) said they'd do their diffsinger data stuff for 14 HOURS.
Also, I'm not perfect, and I do not know good japanese
The max for Diffsinger training is 2 hours, then there's very little visible improvement (source: Diffsinger LIEE Project Doc). Smaller datasets can sound great, too.
Fun fact: Gumi's SynthV VB was created using just 30 minutes of singing data that, don't quote me on this, was reused from her V6 voicebank. It's kind of cool she managed to sound so good with such small data, especially for a commercial project.
 

Suzuki Hoshi

Ruko's Ruffians
Defender of Defoko
Thread starter
The max for Diffsinger training is 2 hours, then there's very little visible improvement (source: Diffsinger LIEE Project Doc). Smaller datasets can sound great, too.
Fun fact: Gumi's SynthV VB was created using just 30 minutes of singing data that, don't quote me on this, was reused from her V6 voicebank. It's kind of cool she managed to sound so good with such small data, especially for a commercial project.
yeah i did 15 minutes of data but i don't know how to label or do the colab part