UTAU Alias Overflow?

Icy-Q

Momo's Minion
Hey everyone, so I'm working on an English VCV+CVVC reclist and I need to know how many aliases I can put into the oto.ini without it overflowing. Due to being worried about overflowing to program, I made a lot of cut backs to the VCV list and it's kinda turned into a Lite VCV. Does anyone know the exact number of aliases you can have without completely destroying the program?

((Btw, my profile is empty because I literally made just to ask this question. I couldn't find information on this topic ANYWHERE. All I know is that Canon overflowed the program when she tried making an English VCV voicebank, so...))
 

Icy-Q

Momo's Minion
Thread starter

Thank you so much! I was having trouble finding a thread that completely answered my question!

Hi @Icy-Q :D
You're welcome to ask questions here~! :wink:

Actually, I don't know what this is so I'll just tag the people I know are the greatest/most knowledgeable about this:
@kimchi-tan , @Kiyoteru , @_caustic_ , @Pupuomena , @Tomato Hentai

(@The people I tagged, I'm so sorry to bother you ~ TT^TT)

Thank you as well!
 
  • Like
Reactions: Piia

Nohkara

Pronouns: He/him
Supporter
Defender of Defoko
Hello! I did a little calculation! Please, click spoiler to read more!

A standard (American) English has 16 vowels* and 23 consonants**.

*5 of 16 vowels are diphthongs, also schwa sound is included in. Any fancy extra "nasal version of /{/" like (& in CZ English list) is not count.
**CC combinations of any kind (e.g. "st" like in "step", "bl" like in "blah" or "nt" like in "can't") are NOT calculated in.

  • So, 16 multiple by 23 consonants, you get 368 possible CV combinations. Then...

"V CV" and "- CV" settings: 17 x 368 = 6256 (oto lines)
"CV" and "V C" and "V C-" settings: 3 x 368 = 1104 (oto lines)
"V V" and "- V" settings: 16 x 17 = 272 (oto lines)

So...

6256 + 1104 + 272 = 7632 (oto lines)

But that isn't enough!

You will need to add CC (and some CCC) sounds too as well! With CC sounds, I advice to CVVC recordings only.

  • If you make full CVVC recordings for each -CC and for each CC-/CCC- full VCC-/VCCC- recordings (there's about 35 different -CC and 62 CC-/CCC-'s), you will have...

"- CCV" "V CC" and "V CC-" settings: 16 x 35 x 3 = 1680
"VCC-/VCCC-" settings: 16 x 62 x 1 = 992

So, 992 + 1680 = 2672 (oto lines)

  • OR just -CC/CC/CC- recordings like in Delta or CZ's VCCV English VB which will decrease recording AND oto count a LOT!

"- CC" and "CC": 35 x 2 = 70
"CC-" and "CCC-": 62

So, 70 + 62 = 132 (oto lines)

  • Depending if you record CC/CCC as full or not, you will have

7632 + 2672 = 10 304 (oto lines)

OR

7632 + 132 = 7764 (oto lines)

For a comparison a basic Japanese VCV VB has 938 oto lines in average (source here)!

So, in short: ENG VCV will have about 7 to 10 times more oto than a Japanese VCV.

So, short answer: Yes, Full VCV English IS technically possible BUT it's ridiculous long to work with.

I think that for VCV English, you can skip schwa sound with no problem, schwa is easily replaced with other vowels sounds (also, I advise skipping "extra" vowel/consonants such as "&" or "dd" from CZ's ENG list). In general, AVOID adding extra sounds!

Also for CCC sounds, I think that there's no point to record full VCCC sounds: so with those, record CCC- only (e.g. "tests"). And to make extra sure that VB will be workable, I suggest it to be one pitch only (aka. no multi pitch Kire etc)

Please note that my calculations are very directional!
Real count can be more than less that what I calculated but size range should be correct.

Good luck with your ENG VCV project!
 
Last edited:

Icy-Q

Momo's Minion
Thread starter
Hello! I did a little calculation! Please, click spoiler to read more!

A standard (American) English has 16 vowels* and 23 consonants**.

*5 of 16 vowels are diphthongs, also schwa sound is included in. Any fancy extra "nasal version of /{/" like (& in CZ English list) is not count.
**CC combinations of any kind (e.g. "st" like in "step", "bl" like in "blah" or "nt" like in "can't") are NOT calculated in.

  • So, 16 multiple by 23 consonants, you get 368 possible CV combinations. Then...

"V CV" and "- CV" settings: 17 x 368 = 6256 (oto lines)
"CV" and "V C" and "V C-" settings: 3 x 368 = 1104 (oto lines)
"V V" and "- V" settings: 16 x 17 = 272 (oto lines)

So...

6256 + 1104 + 272 = 7632 (oto lines)

But that isn't enough!

You will need to add CC (and some CCC) sounds too as well! With CC sounds, I advice to CVVC recordings only.

  • If you make full CVVC recordings for each -CC and for each CC-/CCC- full VCC-/VCCC- recordings (there's about 35 different -CC and 62 CC-/CCC-'s), you will have...

"- CCV" "V CC" and "V CC-" settings: 16 x 35 x 3 = 1680
"VCC-/VCCC-" settings: 16 x 62 x 1 = 992

So, 992 + 1680 = 2672 (oto lines)

  • OR just -CC/CC/CC- recordings like in Delta or CZ's VCCV English VB which will decrease recording AND oto count a LOT!

"- CC" and "CC": 35 x 2 = 70
"CC-" and "CCC-": 62

So, 70 + 62 = 132 (oto lines)

  • Depending if you record CC/CCC as full or not, you will have

7632 + 2672 = 10 304 (oto lines)

OR

7632 + 132 = 7764 (oto lines)

For a comparison a basic Japanese VCV VB has 938 oto lines in average (source here)!

So, in short: ENG VCV will have about 7 to 10 times more oto than a Japanese VCV.

So, short answer: Yes, Full VCV English IS technically possible BUT it's ridiculous long to work with.

I think that for VCV English, you can skip schwa sound with no problem, schwa is easily replaced with other vowels sounds (also, I advise skipping "extra" vowel/consonants such as "&" or "dd" from CZ's ENG list). In general, AVOID adding extra sounds!

Also for CCC sounds, I think that there's no point to record full VCCC sounds: so with those, record CCC- only (e.g. "tests"). And to make extra sure that VB will be workable, I suggest it to be one pitch only (aka. no multi pitch Kire etc)

Please note that my calculations are very directional!
Real count can be more than less that what I calculated but size range should be correct.

Good luck with your ENG VCV project!

Hello! Thank you for the calculations on the full VCV! I guess I should have provided some information on the voicebank though, oops.

The voicebank and reclist will be based more so on Delta's CVVC than CZ's VCCV, just because I'm more experienced with Delta's list and personally prefer it over CZ's VCCV. The reason being is because I kinda... use Teto English... way too much. So I'm more used to the phonemes in Delta's list OTL

So the reclist will be in X-Sampa and borrow a lot of aspects from Delta's list AND some of the Vocaloid phonemes.
Here are the added phonemes and why I added some of them!:

Vowels:
v+i+ (Unstressed "I". fIGHt, kIte)
a3 (Vocaloid Equivalent: Q@. cAR)
i3 (Vocaloid Equivalent: I@. EAR)
u3 (Vocaloid Equivalent: U@. sURe)
e3 (Vocaloid Equivalent: e@. AIR)
o+3 (Vocaloid Equivalent: O@. pOUR)
il (fEEL)
ul (pOOL)
el (tELL)
ou+l (pOLe)
{l (pAL)
i+l (stILL)
u+l (pULL)
o+l (stALL)
3l (gIRL)

((If you're wondering why I need all these extra vowels, it's because it provides a smoother output from the program AND adds more precision to the voicebank. It's kind of hard to explain. Let's say you have to record {_li_li_l-}. The {_l-} affects the outcome of the second "li", so I have the reclist call for separate recordings, {_li_li-} and "il" as a vowel. I'll get more into this later.

Consonants:
tt (Tapping "t"/Used for consonant to "t" transitions: sTop)
4 (Same as Vocaloid. Tapping "d": buTTer)
R (Same as Vocaloid. Trilled "r": tieRRa)

When I release the VB and reclist, I plan on releasing multiple methods of recording!:
A Full CVVC+CC and LITE VCV+VV (Possibility of multipitch, 3 or 4 pitches will probably be the max. Meant for a happy medium of range and smoothness!)
A Full CVVC+CC and full VCV+VV (Slim possibility of multipitch, there will probably be BARELY enough room for one pitch. Meant specifically for smoothness)
A Full CVVC+CC and NO VCV+VV (Guaranteed possibility of multipitch, will allow A LOT of pitches. Meant for wide range. Similar to GUMI_English.)

Okay, so reclist previews! The one thing CZ's VCCV and my VCV will have in common will be that there are multiple reclists. This is done to provide a good amount of organization to the madness that I'm calling VCV.

First up is the VCV portion. The only vowels there will be are: a, i, u, e, ou+, {, i+, v+, u+, @, o+ and 3 (Simple Vowels)
The samples will look something like:

{_a_i_ha_hi_a-
_a_bi_ts+a_ts+i_ba-
_a_di_fa_fi_da-
_a_gi_dz+a_dz+i_ga-
_a_ki_la_li_ka-}

And so on. The will be about ~800 samples (give or take) with 5 aliases each. I kinda added a bunch to it this morning and I'm not quite finished, so I'm not 100% sure.

This is the basic CVVC. There ARE VCV aliases, but that's only if you're using the reclist for VCV and it's to save room. The voicebank will basically be useless without this list. There are also glottal stops [']
Here are some samples:

{_a_a-
_a_'a_'-
_ha_ha_h-
_ba_ba_b-
_ts+a_ts+a_ts+-
_da_da_d-
_fa_fa_f-}

The CVVC list for the simple vowels is done (I believe...) and will contain about ~348 samples with ~3-5 aliases each! The list for the complex vowels is not done however, but it'll probably have about ~700-800 samples with ~3-5 aliases each.

The CC list hasn't even been started. OTL
But it will only have consonants oto'd, no vowels.
The samples will look something like this: {CCV_CCV_CC-} and/or {VC_CV_CC-} depending on the consonants.
And the aliases will look like this: {- CC} {C C} and {CC -}

I hope this gives you all the information you need!
If you have any questions and/or suggestions, feel free to reply!
[doublepost=1490562198][/doublepost]
The limit for OTO length in UTAU is 2^15, or 32768 lines. @_caustic_ knows a workaround, but I wouldn't rely on it.

Thank you, but I'm limiting the reclist to under 32768 lines so users of the voicebank won't have to wait 19 hours, 32 minutes and 17 seconds for the voicebank to load, lmao. Thank you a lot though!
 

Similar threads