The "best" way to oto VCV?

Nohkara

Pronouns: He/him
Supporter
Defender of Defoko
Nota bene: this discussion is mainly for advanced UTAU users, this thread topic might confuse completely new UTAU users. I recommend to beginners to read this topic with caution.

=====

I have been in UTAU for 4 years now and I have downloaded hundreds of times voicebanks and I have found that people oto VCV differently it.

1) The most common way people to oto their VCV is that if VCV recorded with BGM guide in tempo 100 BPM (many old first VCV VB like Luna or Momo are recorded in this tempo), it has following settings for every sound:

Preuttr (red line): 300
Overlap (green line): 100
Consonant (pink area): 450
Blank (blue right area): -600 OR -650

And if recorded in tempo 120 BPM (this tempo is more common in nowadays/newer voicebanks in general), it has following settings:

Preuttr (red line): 250
Overlap (green line): 83 OR 83.3
Consonant (pink area): 375
Blank (blue right area): -583

This oto method is super common in VCV because this is the first method that nearly all new users got taught from beginning in Japanese and Oversea UTAU community. Another reason why this method is the most common is that it's super easy, repeating, and very minimal editing of oto is required only (usually fixing timing Offset settings timing is the only thing that is required to do with this method). I think that this method is called as "Universal oto" (please correct me if I'm wrong)!

I have noticed that VB that is oto by this method works but it has commonly a weird "accordion" sounding like effect on the sound when moving to note A to B.

Other methods what I have find that people oto their VCV is that:

2) They keep the relation between Preuttr and Overlap so that Preuttr is always 3 times huger than Overlap (just like in universal oto) BUT the big difference is that they kept values as minimal as possible. Let's say... Preuttr: 180, Overlap: 60 for example.

3) An another method what I have seen is that relation between Preuttr and Overlap was always that Preuttr is double of Overlap. E.g. Preuttr: 300, Overlap: 150. However, I personally didn't like how this method sounded on VCV.

Once I asked on Twitter that why people commonly put ratio 3:1 but not 2:1 in VCV and one answer that I got was that "it's because 3:1 works best and causes the least errors while using". Is this true? Are there big differences between 3:1 and 2:1 and if so how and which is the best overall/in general for VCV.

4) And the last method what I have seen is that there's no any ratio between Preuttr and Overlap on any VCV settings and what they basically do is that all consonant sounds (expect stopping sounds K/T/P) are tightly sandwiched between Preuttr and Overlap like this:

Screen Shot 2017-08-07 at 10.22.45.png

And same with method 3), I didn't really like how it sounded and some places the consonant or overlap kinda sounded weirdly off...

So, my question is after this long description is that what's the "best" way overall to oto VCV and WHY:

What are pros and cons of each 1-4 methods mentioned above? What method soundest the least "accordion" like or otherwise not weird/the least "autotune (is this right word to describe it?)" and does resamplers effect how it sounds? I have heard that UTAU with Universal oto sounds better in Moresampler but I can say nothing on that as being UTAU-Synth user only.
 

kimchi-tan

Your local Mikotard
Global Mod
Defender of Defoko
From what I've experienced, the 3:1 methods are definitely the way to go; I rarely get any crossfading errors with this method. The 2:1 method gets a LOT of crossfading errors - this can easily be observed with most of moresampler's generated otos as it uses this method and which is why a lot of people hate its otos

EDIT: Ratios more than 2:1 tend to have way less crossfading errors after some fiddling around
 
  • Like
Reactions: Nohkara

Kiyoteru

UtaForum power user
Supporter
Defender of Defoko
I don't really like bothering with all this ratio stuff, because I see it as base values. If you want to do a completely custom VCV OTO then the only value that should stay the same is overlap. It would cover the previous vowel up to where it starts to destabilize/fade out, so that the natural transition from V to C to V doesn't get caught up in the crossfade. All the other parameters should be moved around to fit the waveform in particular. Using ratios and base values is just a method of getting closer to the final position, and might sound "good enough" if you can't expend a lot of effort.

This is the formula for calculating base values, based on recording tempo.
Consonant = (10000/tempo) * 4.5
Cutoff = 0 - (10000/tempo) * 7
Preutterance = (10000/tempo) * 3
Overlap = 10000/tempo
Offset Increment = (10000/tempo) * 6

I copied this from the code of my VCV generator, but it was based off the behavior I saw in OREMO's generator.
 

Dangosan

Jellie Bellie Pete Rat Gummie Candie
Defender of Defoko
Question is that how do I count the tempo of my VCV string?
I really hate it when a VCV rendering sounds robotic. (That's why I use presamp+moresampler for VCVs; the quality of the transitions do get affected by the resampler/wavtool in use although you can increase the STP manually to make a VCV sound better)
 

Nohkara

Pronouns: He/him
Supporter
Defender of Defoko
Thread starter
Question is that how do I count the tempo of my VCV string?
I really hate it when a VCV rendering sounds robotic. (That's why I use presamp+moresampler for VCVs; the quality of the transitions do get affected by the resampler/wavtool in use although you can increase the STP manually to make a VCV sound better)
People uses BGM guide (BGM maker usually tells the tempo in txt file of it) on OREMO or people record with metronome.

But if no idea what recorded tempo is and want to calculate that, you might want to use Audacity to figure out that*. You can generate beats in Audacity in xxx tempo, so generate few beats until find one that is exact or close to your recording's tempo.


*If recorded without any BGM guide, the tempo might variate between samples by plus minus 10 BPM or more depending how well did you keep the tempo during the recording.
 

Similar threads