oto explaination?

animecrazy13

Momo's Minion
i just recorded an append for my Utau and im stuck with the oto. ive looked up tutorials but im still confused. can someone explain it for me? thanks :D
 

Yue Nagareboshi

Senior Tutor
Senior Tutor
Tutor
Defender of Defoko
It's pretty easy:

>> OFFSET - It's the left blue area. It's a zone that will be completly discarded of your recording by the program while processing your sample.

>> CONSONANT - It's the pink area. While this may give you the idea that this is a zone ONLY for the consonat sound; it isnt. This parameter is to make one part of your sample not get stretched by the engines (the white area is stretched). You should always cover with the pink area the whole consonant and some part of the vowel that have volume changes. Always move the pink zone until your vowel volume is stable.

>> CUTOFF - It's the blue area on the right side. Just like the offset, it's a discarded area of the recording. You must place it a bit before your vowel sound starts to fade out.

>> VOWEL - This is the white area. This parameter is created by the before mentioned values given. The white area must consist only on a vowel sound that have a stable pitch/volume. By this, the UTAU rendering engines will stretch his area as long as needed for the note the UTAULOID is singing. The wider the vowel, the more "natural" will sound; while having a really small one may result in a robotic sound (examples are Defoko, Chikune Kenta, and Sofi).

>>PREUTTERANCE: Known as the "red line" This parameter is should be placed right in the middle of the consonant to vowel transition. This line is important because will determine if your UTAULOID is singing properly.

>>OVERLAP: The green line. This is the place where the previous note will end (where the vowel of the previous note will end it's fade out) and where your consonant will end to fade-in in the current tone.

I will bring you a diagram by adriann (Daatura voicer) who explained this in a graphic way and will help you to understand better.

tumblr_m8o8k8ap9q1rvcrnao1_1280.png

[sub][sub]source: http://adriann-has-no-ovaries.tumblr.com/post/29306743331/so-i-was-extremely-off-the-previous-time-i-drew[/sub][/sub]

There are several methods to oto a bank. I recomend you and easy (and fast) way to make a CV bank sound nice; which is the Cz-method.

DECLAIMER: While I find this method really easy, fast, and with good results to work with, there are many users with different oto techniques. Use the technique you feel more comfortable with!

~~~

To oto with this method, you need to have at least from 0.2 to 0.3 seconds of silence before the consonant sound (if you have that space without adding it via audacity or any audio-editing program is ok).

Set your parameters as this:
OFFSET: 200
CONSONANT: 200 (and adjust as needed)
CUTOFF: -600 (and adjust as needed)
PREUTTERANCE: 90 (this will be adjusted)
OVERLAP: 60 (this will remain like this always).

What you basically do is move your offset until the green line is right where your consonant starts; then you move your red line just where your vowel starts and your consonant ends. You just adjust your pink zone where it looks nice (where stable vowel starts) and your cuttoff where it's right (at the end of your stable vowel and before it starts to fade out). DONE.

A particular case of otoing appears with sounds like P, K, and T's (as well the derivate sounds that goes with Cyv or Cwv -Consonant+"i" or "u" sound+vowel) You need to leave a silence before the consonant so it can sound natural.

Capturaoto.png


What I usually do is to move your offset until the red line is at the start of the consonant sound, and then it move it to it's right place. In this way you will have a 30 miliseconds of silence between the consonant and the fade-out of the previous vowel.

For vowels, I set the preutterance and overlaps to 0 both, set them where the vowel is stable, and use the crossfade tool of UTAU. "Crossfade" gives you really nice results blending the vowels (Tools > Build-in Tools > Crossfade; or press "u" on your keyboard).

Try to work it this way, and if you need a feedback just drop me a message and I'll try to check it and make adjustements (not make all your oto) if needed, ok?

Hope this info is helpful for you.
 

animecrazy13

Momo's Minion
Thread starter
Yue Nagareboshi link said:
Try to work it this way, and if you need a feedback just drop me a message and I'll try to check it and make adjustements (not make all your oto) if needed, ok?

ive tried to follow what you said and it was helpfull but she still sounds "off"
could you check it?  (im not asking for you to do the oto just to tell me whats wrong so i can fix them and learn for myself) :D
here's a link to the voice bank : http://www.mediafire.com/?qjhie75ja2g2rm7

thanks :D
 

irei1as

Teto's Territory
(Edit:
Warning - while you were typing 2 new replies have been posted. You may wish to review your post.
Oh well, I'll input my reply anyway. The more the merrier, isn't it?)



The human speech is very complex. For a computer a .wav is just numbers, not the syllable "ka". And also every human has different ways to say the exact same sample.
So, instead of a very expensive and hard to code AI, UTAU uses human input to understand how to start manipulating those particular samples. That manual information given to the program is the "hand-made" oto.ini.


Ah, by the way, in order to be able to profit the oto you need to treat your ust:
http://utaforum.net/index.php?topic=393.0

If you don't do that then you aren't using your oto.
Also after a change of oto you may need to clear the cache for the changes to take effect.

First let me explain the structure of each line of the oto.ini (it's a plain text file that you can open in programs like notepad). I'll just spoil it if you want to ignore it as it's mainly technical information not needed for the use of UTAU.
At the start there is the filename of the sample being described followed by the equal sign (=).
Then usually an "alias" followed by a comma (,) and then five numbers with commas in between, too.

For example:
a.wav=あ,23,100,18,87,87

It can happen the alias is ommited if it's not needed. For example:
xx.wav=,23,102,1,89,0

Also there can be more lines than samples in the voicebank as you can "duplicate" lines so you can use different settings for the same sample. For example:
a.wav=あ,23,100,18,87,87
a.wav=*a,23,10,18,100,-10

(Note that the values here are just random numbers and you must use different values.)

Inside in the oto configuration there are six values to edit for each sample (duplicated samples, too): Alias, Offset, Consonant, Cutoff, Preutterance and Overlap.
All the numerical values are in miliseconds (ms).
Commonly known respectively as "alias", "blue zone of the start", "pink zone", "blue zone of the end", "red (vertical) line" and "green line".

Also there is a part not described by a number there but that is the part zone left in white. I'll call it Vowel but you may know it also as "white zone (of the middle)".

As special note, even if we are talking about "Consonant" and "Vowel" parts they are not literally the spoken consonant or vowel of the sample.
Consonant zone may contain spoken vowels and the vowel zone may be even a voiced spoken consonant sound. You'll understand this in the description of each part.


The way to change the oto in UTAU is simple. You open UTAU, select the used voicebank in the project properties and then go to tools->voicebank settings.
Or just press control+G with no note selected (if you have a note selected it will appear first the editor of that particular note).
There just click in the line you want to modify.

For the alias you need to type in its box (you can't usually copy-paste) and press "set" so it's saved. If you make any change and forget to press "set" you may lose the change.
The other are numerical values. You can type the numbers but it's more common to use "Lauch Editor" and do the changes with the graphical interface.

The way to use that built-in editor of UTAU so all the zones and lines are shown:
At the start you can see the waveform of your sample with a white background. Also at the left there is a red vertical line and under it a green line.

Move the arrow of the mouse at the start of the sample and it will turn into a "+". Click at the top left or the bottom left (outside the lines) while still a "+", drag to the right and a pink zone will appear.
Do the same again when the pink zone is shown and a blue zone of the left will appear. (Moving the blue zone will move the vertical lines.)
You can also move the red vertical line and the green line the same way (click with the "+" and drag around). Green line has more freedom and it can go even to a place in the blue zone or even before the sample starts (grey zone). Red line can't enter the blue zone.
Now if you scroll to the end of the sample you can drag the right wall to make appear the blue zone of the right.
Note there is left a white zone that has been changed with the movement of the pink and blue zones.


Now let me describe in depth each subject.

-Alias-

As the name says "alias" is another way to call a sample.

So if we use "1" for the alias of the sample "one.wav", the program UTAU will use the sound coming from "one.wav" if you call in the piano roll the note "one" (the filename) OR the note "1" (the alias).

This is useful for using various alphabets (like romaji and hiragana).

Note1: Alias takes priority over filename. For any note UTAU will seek first if there is a sample with that alias, if it can't find one it will try then with the filenames. And, well, if it can't find neither of those then it will just play a silence (you don't have that sample to play).
Note2: First alias is used. If two samples have the same alias, the one closer to the start of the list will be always played.

But it has another special use for special ways of recording samples.
As you use normal CV it's not important for you so I'll just spoil that info.
You can duplicate samples in order to use different configurations of their numerical values even if it's the same sample -specially used in VCV voicebaks as each sample has various sounds-.
A way to be able to use differenciated parts of the same sample is to give each part a different alias as UTAU looks first for the alias and ignores the filename if it finds one.

Or, well, it could happen the same sample has more than two names (filename and alias) so you duplicate it more to give extra aliases).


-Offset- (blue zone of the start)

UTAU can't tell were the usable part of the voice in the sample starts. It will NOT ignore any undesirable silence/noise of the start so you must point where the unused zone is placed.
Well, if you cut manually the silence and all the sample is usable the this zone can be ignored (I, personally, don't like to use zero as a value here so even if there is no silence I'll input "1" milisecond. It's a very small value so it will not be noted anyway).

Also, it's difficult to say alone samples and it may happen some sounds at the start are too long and they take too many vowel sound in the songs. (Usually it happens with consonants like s and n).
Just add the extra start of those consonant sounds to the blue zone and UTAU will ignore it and use a smaller part.

If you use the editor built-in UTAU this is the blue zone of the start.


-Consonant- (pink zone) & -Vowel- (white zone)

The consonant part is the part of the sample that UTAU will not modify the length and the vowel part is the part that UTAU will stretch, change and "experiment" in order to make long notes.

To be able to say where each zone must start needs a bit of experience and practice.
Basically the white zone is where the vowel sound remains constant and the pink zone has the part before it (the consonant and the transition consonant-vowel where the vowel sound is starting to be formed).
Example: the sample sa becomes saaaaa for a long sample (correct). If you used s sound in the white part it would sound sssaaa (incorrect).

Samples with only vowel sounds (like "a.wav") have no consonant sound but it's recomended to use the pink part for the small part of the start where the a sound is being formed.
(But this can be ignored if you feel like it.)

Now there are some exceptions like diphthongs.
An example: for the "kya" sample, the pink zone is the k sound, the y sound and the little zone where the a sound is being formed and the white zone is the part where the a is mature.
So a long note is kyaaaaa (correct). If you used the y in the white zone then it would be kyyyaaa (incorrect).

Now two more points but as it's more advanced (and not usually related to CV) I'll just spoil it.
Note1: In a previous example sssaaa sound is stated as incorrect but it's possible a song wants that.
In the properties of each note there is a value called "Consonant velocity". Using values like 75 (75%) or 150 (150%) will make the pink part go slower or faster. Make it slower and you'll have the sssaaa effect.
Note2: With some languages you may need stretched consonant sounds. In these cases the consonants are in the white zone. But technically they're playing "semi-vowels".


-Cutoff- (blue zone of the right)

In this zone of the end of the sample you explain to UTAU you don't want to use this part.
Take in account it's not only the silence and noise after the desirable white zone but the part of the vowel has lost its constant feature (the zone where the vowel starts to fade must be ignored).

Note: Sometimes you don't want to ignore the zone where the vowel fade-out.
Some voicebanks has special samples/aliases in order to add these as an special option but usually in normal samples you want UTAU to ignore them.


-Preutterance- (red vertical line)

Preutterance shows the point where the sample will be "pasted" to the start of the note. Everything at the left of this red line will be set before the actual used note and it will steal space to the previous note.
This line is placed exactly where the good vowel is starting to being formed (so usually inside the pink zone).

The explanation for this is musical: the consonant sound can't be used for the music. It's the vowel sound what makes the tone. So the note must start with the vowel.

The previous sample may overlap the note for this but it's more important for that the next subject, Overlap.

Note: If any note needs a special setting you can change its preutterance in the properties of that special note.


-Overlap- (green vertical line)

Overlap is kinda special as instead of modifying the actual note it modifies the previous one. Overlap shows the point where the previous note will end.
The reason of this setting is to make changes between notes more smooth thanks to the fading tool of UTAU.
The zone between the start of the pink zone and this green line will play at the same the previous note and the one of the oto configured.

This is specially useful for samples with long consonants like s, n, m, etc.
For example, we have the note ka followed by sa (kasa). We place the overlap of sa in the middle of the s sound so after treating the ust (check the start for a link of that) we will get the first a will go smoothly inside the s sound making it sound as kasa (and not as ka -silence- sa).

So, usually, this green line goes around the middle (25%-60% ?) of the consonant sound (warning: not the consonant zone -pink zone-, but just the first consonant sound).
It may be more or less depending on how you said the sample so it needs a bit of trial and error.

Note: This line can be placed before the sample making it a negative value. This means the previous sample ends even before than the actual note start creating a silence between the notes. This is used for some plosive consonants like k and p. As always with Overlap, trial and error and even checking other conpleted voicebanks is needed to know how each sample needs to be treated.

NoteB: The exact same way as Preutterance, the Overlap setting can be changed individually for a special note in its particular properties.
 
  • Like
Reactions: Oyxong

animecrazy13

Momo's Minion
Thread starter
irei1as link said:
(Edit:
Warning - while you were typing 2 new replies have been posted. You may wish to review your post.
Oh well, I'll input my reply anyway. The more the merrier, isn't it?)

thank you so much! your reply was extremely helpfull! she sounds better already! :D
 

IrisFlower

Precious Flower with Thorns
Supporter
Defender of Defoko
Taylor Savage link said:
after reading all that otoing still confuses me

Well otoing is what makes your UTAU sound nice and not a choppy stuttery mess.

Did you try otoing while following their explanations? Sometimes actually doing it is what it takes to actually get it right. If you've honest to goodness opened up UTAU and tried oto-ing yourself tell me what part of their explanations confuses you and I can try and put it into more simpler terms for you. But ya know keep in mind what Yue said, people oto a bit differently(my method is way different from his but it works and I'm comfortable with it :3 ) so yeah;;
 

Taylor Savage

Teto's Territory
IrisFlower link said:
[quote author=Taylor Savage link=topic=3235.msg23884#msg23884 date=1346465471]
after reading all that otoing still confuses me

Well otoing is what makes your UTAU sound nice and not a choppy stuttery mess.

Did you try otoing while following their explanations? Sometimes actually doing it is what it takes to actually get it right. If you've honest to goodness opened up UTAU and tried oto-ing yourself tell me what part of their explanations confuses you and I can try and put it into more simpler terms for you. But ya know keep in mind what Yue said, people oto a bit differently(my method is way different from his but it works and I'm comfortable with it :3 ) so yeah;;
[/quote]
I just tried it but it kinda mucked up the sound and still sounds terrible.
 

Yue Nagareboshi

Senior Tutor
Senior Tutor
Tutor
Defender of Defoko
Taylor...

Please post images of the otoing screen (the visual) of your otoing to know what are you particularly doing; as well as some audio example?

second, have you adjusted the ust to your oto settings?
http://utaforum.net/index.php?topic=393.0

Saying "still sounds terrible" give us no idea at all of what you are saying by that; and we are not psychics to know what is your particular issue. Please illustrate us with samples that we can work with.
 

Taylor Savage

Teto's Territory
Most of it sounds ok but when it gets to the 'S' and 'K' sounds it makes the distorted sound. I tried to fix it but it doesn't work. I worried if I put it up for downloading nobody will like it.
 

IrisFlower

Precious Flower with Thorns
Supporter
Defender of Defoko
Well give us screenshots and an example or we can't help you fix it ^^;; How can we help if we have nothing to look at or listen to?

You don't have to upload the voicebank just post a sample to soundcloud.

At any rate have you tried re-recording and deleteing the frq files?
 

Halo

Icon by Wanpuccino @ DA
Administrator
Defender of Defoko
Please don't double post in quick succession. In the future, please edit your last post with the new information.

Also, your strange "k" sounds might be due to recording as opposed to otoing, have you figured out how to post a screenshot yet?
You should have a few tools inbuilt on your computer to do this, either the prtsc button or "Snipping Tool" in accessories.
 

Taylor Savage

Teto's Territory
Halo link said:
Please don't double post in quick succession. In the future, please edit your last post with the new information.

Also, your strange \"k\" sounds might be due to recording as opposed to otoing, have you figured out how to post a screenshot yet?
You should have a few tools inbuilt on your computer to do this, either the prtsc button or \"Snipping Tool\" in accessories.
Sorry I just gave it another try. sorry Halo-sensei