UtaForum should i make a diffsinger?

Suzuki Hoshi · Apr 5, 2025

it seems INCREDIBLY time consuming. How am I supposed to do THAT MUCH DATA. Like 5-10 minutes'd be fine, but @heynotloid (sorry about tagging you) said they'd do their diffsinger data stuff for 14 HOURS.
Also, I'm not perfect, and I do not know good japanese

Kawaiine Is Queen · Apr 6, 2025

I've seen some fairly good Diffsingers with around 10 minutes of data, and I know there's some way to do XLS so you wouldn't even need to record in Japanese. If you really want Japanese data, and a lot of it, you could record for a voice conversion AI such as RVC or Diff-SVC first and use that to get singing data instead (with the slight loss of a more "personalized" singing style, but being specific about what audio you convert could probably minimize that). I don't know what the quality of Diffsinger models made with voice conversion tends to be, but I know it's a recommended method on the Diffsinger Google doc.

If you use the voice conversion method, there are probably some restrictions and/or possible legal concerns with the audio you're allowed to use, but I'm not sure what they'd be. If you have Discord, I think there's still an official(?) Diffsinger server you could ask in if you can't get an answer.

Suzuki Hoshi · Apr 6, 2025

Kawaiine Is Queen said:
I've seen some fairly good Diffsingers with around 10 minutes of data, and I know there's some way to do XLS so you wouldn't even need to record in Japanese. If you really want Japanese data, and a lot of it, you could record for a voice conversion AI such as RVC or Diff-SVC first and use that to get singing data instead (with the slight loss of a more "personalized" singing style, but being specific about what audio you convert could probably minimize that). I don't know what the quality of Diffsinger models made with voice conversion tends to be, but I know it's a recommended method on the Diffsinger Google doc.

If you use the voice conversion method, there are probably some restrictions and/or possible legal concerns with the audio you're allowed to use, but I'm not sure what they'd be. If you have Discord, I think there's still an official(?) Diffsinger server you could ask in if you can't get an answer.

I did japanese audio and english audio

SaKe · Apr 6, 2025

Suzuki Hoshi said:
it seems INCREDIBLY time consuming. How am I supposed to do THAT MUCH DATA. Like 5-10 minutes'd be fine, but @heynotloid (sorry about tagging you) said they'd do their diffsinger data stuff for 14 HOURS.
Also, I'm not perfect, and I do not know good japanese

The max for Diffsinger training is 2 hours, then there's very little visible improvement (source: Diffsinger LIEE Project Doc). Smaller datasets can sound great, too.
Fun fact: Gumi's SynthV VB was created using just 30 minutes of singing data that, don't quote me on this, was reused from her V6 voicebank. It's kind of cool she managed to sound so good with such small data, especially for a commercial project.

Suzuki Hoshi · Apr 6, 2025

SaKe said:
The max for Diffsinger training is 2 hours, then there's very little visible improvement (source: Diffsinger LIEE Project Doc). Smaller datasets can sound great, too.
Fun fact: Gumi's SynthV VB was created using just 30 minutes of singing data that, don't quote me on this, was reused from her V6 voicebank. It's kind of cool she managed to sound so good with such small data, especially for a commercial project.

yeah i did 15 minutes of data but i don't know how to label or do the colab part

Search

UtaForum should i make a diffsinger?

Suzuki Hoshi

Ruko's Ruffians

Kawaiine Is Queen

Momo's Minion

Suzuki Hoshi

Ruko's Ruffians

SaKe

Ruko's Ruffians

Suzuki Hoshi

Ruko's Ruffians