Need help otoing a VCV bank


Ruko's Ruffians
Defender of Defoko
So I just finished recording a bank after 2 years of not recording and I have no clue how to oto. I tried back in the old days of CV but was met with lackluster results. If someone could help me out, I'd very much appreciate it.

Thanks for reading.



UTAU Sensei
Defender of Defoko
The first thing that you'll need to do is set up the aliases. For each line in the OTO, duplicate the line until the number of duplicates is the same as the number of syllables. For example, for ka_ka_ki_ka_ku.wav you will need 5 duplicates. Then you will need to alias each duplicate according to each syllable in HIRAGANA. Our example becomes [か][か][き][か][く]. Finally, for each syllable, you put the vowel of the previous syllable in ROMAJI and put one normal space between it and the hiragana. When it's the first syllable of the sample, use a hyphen instead. The example becomes [- か][a か][a き][i か][a く].

This is all very tedious, so the easier way to go about it is to use a base OTO generator. You can either use the one built into OREMO and Setparam (hiragana file names only), or the one in Moresampler (accepts romaji with each syllable separated by underscore). For Moresampler, use "hiragana only" for the aliasing options.

Once all of the aliases are done, you simply need to go through and adjust the parameters to fit. (The generators can give you a rough fit, which is handy if you want to quickly hear how the voicebank will sound in UTAU, but before you can release it as a finished and complete voicebank you need to make these adjustments.) Unlike CV, the principle behind OTOing VCV is very simple and doesn't vary at all depending on the type of consonant.

Leave the overlap (green line) at its current value (unless it's incredibly long, in which case, 80 is a good all-around amount to use).
Move the offset (left blue blank) so that the area between it and the overlap is a consistent, stable vowel.
Move the preutterance (red line) to where the consonant ends and next vowel begins.
Move the consonant (pink area) to where the next vowel starts being stable and consistent.
Move the cutoff (right blue blank) to where the next vowel starts to fade out.

This is an example of what it would look like in a real voicebank.

If it's difficult to distinguish features of the waveform, you may need to switch to spectrogram view.

Vowels are nice consistent horizontal lines, and consonants will appear like gaps, noise, etc. depending on what it sounds like. This view can reveal features that are otherwise hard to understand.
  • Like
Reactions: YukitoYuki


Teto's Territory
or you can just use moresampler to get it done faster than tweak the moresamp oto in setparam

Similar threads