• If you do not recieve your confirmation email within a few hours, please email haloutau@gmail.com with your username for manual validation. Your account should be activated within 24 hours.
    You may also reach out via any other listed contact on Admin Halo's about page: https://utaforum.net/members/halo.194/#about

Traditional VB vs ENUNU/NNSVS bank

Row 4

Ruko's Ruffians
Defender of Defoko
Hello again!

I'm kind of feeling out the vibes of this before I commit to it, so I'd like some community opinion.

Firstly, I'm very happy with the current progress of 26's newest bank set, and I'm going to continue developing and recording it post-release in January.
Despite the aggressive accent on the JP bank (which I'm kind-of content with, as I'm not trying to explicitly imitate a native Japanese tone), the tone variance between the recorded samples make 26 sound MUCH more natural than previous versions. (Putting that BA in theatre to USE mama.)
I have no plans at this point to re-cast voice providers for the other Row 4 Banks at this time.

Secondly, I'm seeing a lot of "new" (as in, created within the time of my absence in the community whilst I was pursuing higher education, yeehaw) kinds of voicebanks using ENUNU/NNSVS AI model training.
I don't see a reason not to create a "26 AI" bank, but at the same time I'm kind of strong-arming a traditional bank out.

My personal pros and cons are this:

PROS:
Much more natural sounding banks
Would properly encapsulate the tones and ranges in vocals I have
Would become (if I'm understanding this correctly) compatible with other vocal synths like DeepVocal

CONS:
Whole lotta data to process
Not as streamlined for widespread use


With the above information in consideration, I still have a few questions.

What do you, as a regular or semi-regular UTAU user, feel the benefit of an AI-compatible/configured bank is?

How would/do you use regular banks in comparison to AI banks?

What do you prefer in terms of tone? (More power, character based, softer, etc)
 

SunnyWolves

Ruko's Ruffians
Defender of Defoko
I feel as if the benefits of an AI voicebank is the added realism, ease of use, and bonus abilities (vocal modes, cross lingual, auto pitch)
I typically use standard vocals when there is no ai voicebank for that character or when the ai voicebank is hard to use and I don't feel like using it (Mine Laru has a DiffSinger voicebank which I love, but it takes forever to render and has messed up consonants, so I use his utau vocal more often)
As for tone, I typically prefer more powerful ones
Also, what made you pick nnsvs/enunu over DiffSinger?
 
  • Like
Reactions: Row 4

Row 4

Ruko's Ruffians
Defender of Defoko
Thread starter
I feel as if the benefits of an AI voicebank is the added realism, ease of use, and bonus abilities (vocal modes, cross lingual, auto pitch)
I typically use standard vocals when there is no ai voicebank for that character or when the ai voicebank is hard to use and I don't feel like using it (Mine Laru has a DiffSinger voicebank which I love, but it takes forever to render and has messed up consonants, so I use his utau vocal more often)
As for tone, I typically prefer more powerful ones
Also, what made you pick nnsvs/enunu over DiffSinger?
I actually haven't decided between any of the AI options, so I'm also open to suggestions on that front!~ Whichever is most streamlined would be ideal for me tbh.
 
Last edited:

SunnyWolves

Ruko's Ruffians
Defender of Defoko
I don't know how creation goes, but DiffSinger seems to be the easier to make (I haven't made one, only heard things about it, so don't take this as fact) and is definitely more realistic and easy to use, although it takes a lot longer for the audio to render
Here's an example of diffsinger if you haven't heard it:
 
  • Like
Reactions: Row 4

Berrweary

Teto's Territory
Defender of Defoko
I would say, if you have the time and resources go for an AI bank, but it requires alot of things, like a good mic with clean audio ( the ai can mess up training if theres unwanted audio)

I would go for normal utau vbs cuz i scream alot and idk how that'll effect the AI.
I do have an AI vb in development but also a better vcv which is similar to rukos kire/power bank.

Only benefits i see from AI banks are that, it sound real and great, if you want an english vb then its better to label and train then do a multi pitch vccv (thats my opinion) as vccv takes longer to configure and record where as for an AI bank like diffsinger just needs labled audio and training (which does cost alot of time and resources like gpu and cpu)

regular banks are eaiser to use for me cuz it doesn't take its sweet time to render, diffsinger or enunu takes for ever to render one note on my computer so if i do a couple pitch bends i have to wait anoter like 10 minutes so i can hear the changes. (thats why i tune with a normal bank and just render it on AI banks later)

Also another pro for AI banks is that it can help with accents abit using parallel training.. or so I've heard


In terms of tone I prefer Power, I don't really know why but it just sounds nice?

Idk if any of this is relevent but thought i should say this :P
 
  • Like
Reactions: Row 4

Similar threads