What makes a CV_JP oto.ini configuration good?

ΑKYLAS

Teto's Territory
Thread starter
I’ve been seeing tons of different methods of otoing CV style voicebanks from both native and overseas users. Despite using a plethora of voicebanks and reading through every single resource I could find, I managed to come upon vastly different opinions that contradict with each other.

I guess my question is: Does a good oto.ini configuration vary from voicebank to voicebank or is there a standard formula that if executed well, it prevents problems and guarantees a really good performance?

I've prepared a couple of examples . Each sample is taken from an entirely different voicebank. Their publicers/otoers are experienced/advanced users .

Sample A [KA]

Sample A.png

Sample B [KA]

Sample B.png


Sample C [KA]

Sample C.png

Sample D [KA]

Sample D.png



As you can see each sample has different configuration settings:

Sample A claims that since "K" is a hard consonant, its overlap will be set with a low (but positive) value.

Sample B claims that since "K" is a hard consonant its overlap will be set with a negative value.

Sample C claims that "K" (although it is a hard consonant) in this case the consonant part of the sample (k) is emphasized by the voice provider, its overlap will have half the value of the preutterance. (Basically it will be otoed like a soft consonant).

Sample D claims that since "K" is a hard consonant, the sample should have silence before the actual sound starts. The overlap has positive value.


These are the most popular variations of hard consonant otoing I found. (I didn't mention vowels ,smooth consonants and glides since I find them the easiest to configure).



Despite their differences, I found the above methods to perform well (each vocal with slight differences between one another but overall they are easy to use and produce smooth results. It doesn't sound like something is wrong ... ) I know what the parameters do, I just don't know what's the best configuration since I tested them all and honestly 3 out of 4 are pretty usable. I am overthinking this and becoming paranoid. Please share your thoughts and opinions! Thank you!

:smile:
 

Kiyoteru

UtaForum power user
Supporter
Defender of Defoko
The only correct one is D, because the rest are cutting out the actual attack of the consonant itself. If you look at a "k" consonant in the context of multiple syllables, such as in an entire line of lyrics being sung, or in a VCV voicebank, you'll notice that there's a short pause before it every single time. We need to account for the silence before the consonant, and be able to hear the entire consonant. This applies to all plosives and affricates.
 

ΑKYLAS

Teto's Territory
Thread starter
I've read your CV OTOing guide so honestly I was expecting this answer!

Are there any set values for the offset and the overlap? I've seen a lot of variations and sometimes the "gap" of silence makes the voicebank's performance a little awkward.
 

Kiyoteru

UtaForum power user
Supporter
Defender of Defoko
There aren't. The problem with CV recorded as standalone syllables is that you pretty much just have to guess what a reasonable amount could be and check it by using the voicebank. For any voicebank recorded with more than one syllable at a time, you can look at the audio itself and key points in the audio to determine where all of the OTO parameters are supposed to go, and there's no guesswork involved.
In fact, if you were to record a CV voicebank in strings of syllables, you could use the very end of the previous vowel as it's fading out for the overlap of the OTO. That way, the timing from the overlap to the preutterance is exactly the same as the real length of the consonant you recorded. However, this only works if the voicebank is STRICTLY CV, and there are occasions where the mismatching vowels is more noticeable and you will have to adjust the STP/Overlap/Preutterance in the note properties.
In a CVVC voicebank, the way that the CV parts are configured relies on the fact that every note is preceded by a VC note. The CV by itself doesn't need to handle the job of creating a smooth transition from the previous syllable to the next, it just needs to connect to the VC.
 

Similar threads