I’ve been seeing tons of different methods of otoing CV style voicebanks from both native and overseas users. Despite using a plethora of voicebanks and reading through every single resource I could find, I managed to come upon vastly different opinions that contradict with each other.
I guess my question is: Does a good oto.ini configuration vary from voicebank to voicebank or is there a standard formula that if executed well, it prevents problems and guarantees a really good performance?
I've prepared a couple of examples . Each sample is taken from an entirely different voicebank. Their publicers/otoers are experienced/advanced users .
Sample A [KA]
Sample B [KA]
Sample C [KA]
Sample D [KA]
As you can see each sample has different configuration settings:
Sample A claims that since "K" is a hard consonant, its overlap will be set with a low (but positive) value.
Sample B claims that since "K" is a hard consonant its overlap will be set with a negative value.
Sample C claims that "K" (although it is a hard consonant) in this case the consonant part of the sample (k) is emphasized by the voice provider, its overlap will have half the value of the preutterance. (Basically it will be otoed like a soft consonant).
Sample D claims that since "K" is a hard consonant, the sample should have silence before the actual sound starts. The overlap has positive value.
These are the most popular variations of hard consonant otoing I found. (I didn't mention vowels ,smooth consonants and glides since I find them the easiest to configure).
Despite their differences, I found the above methods to perform well (each vocal with slight differences between one another but overall they are easy to use and produce smooth results. It doesn't sound like something is wrong ... ) I know what the parameters do, I just don't know what's the best configuration since I tested them all and honestly 3 out of 4 are pretty usable. I am overthinking this and becoming paranoid. Please share your thoughts and opinions! Thank you!
I guess my question is: Does a good oto.ini configuration vary from voicebank to voicebank or is there a standard formula that if executed well, it prevents problems and guarantees a really good performance?
I've prepared a couple of examples . Each sample is taken from an entirely different voicebank. Their publicers/otoers are experienced/advanced users .
Sample A [KA]
Sample B [KA]
Sample C [KA]
Sample D [KA]
As you can see each sample has different configuration settings:
Sample A claims that since "K" is a hard consonant, its overlap will be set with a low (but positive) value.
Sample B claims that since "K" is a hard consonant its overlap will be set with a negative value.
Sample C claims that "K" (although it is a hard consonant) in this case the consonant part of the sample (k) is emphasized by the voice provider, its overlap will have half the value of the preutterance. (Basically it will be otoed like a soft consonant).
Sample D claims that since "K" is a hard consonant, the sample should have silence before the actual sound starts. The overlap has positive value.
These are the most popular variations of hard consonant otoing I found. (I didn't mention vowels ,smooth consonants and glides since I find them the easiest to configure).
Despite their differences, I found the above methods to perform well (each vocal with slight differences between one another but overall they are easy to use and produce smooth results. It doesn't sound like something is wrong ... ) I know what the parameters do, I just don't know what's the best configuration since I tested them all and honestly 3 out of 4 are pretty usable. I am overthinking this and becoming paranoid. Please share your thoughts and opinions! Thank you!