About engine noise and voice "deformation"

Mougeki Mero

Defoko's Slaves
Defender of Defoko
So, I've been recently listening to some insteresting things on soundcloud when this came to my mind...How does different engines can really influence on handling a vocal.

(Before anyone can bring this to light, I posted this thread on the Chit-Chat because I just want to discuss this topic and it is an irrelevant thing: I wont make any use of this, I am just curious...)

So, I see everyone (not really, but most peps) who works with/listen to/know a vocal synthesizer always tend to say the cliche "...That Synth has engine noise; This synth distort vocals, etc" and I talk specially about the famous ones, like UTAU.

I myself has done some research last year, I imported a few samples from different voices and tested them on three singing synths (UTAU and other two that I wont tell) and two talking synths (wont tell which ones as well). I also wont tell how I did this, so dont ask me okay?

Anyway, back to the topic, I noticed that after importing said samples, I noticed two of the singing synths didnt modified the vocal samples at all, they sounded the same, even tho all of them were different tones (a nasal voice, a mature one, a "sensual" deep one, a hi pitch one, etc). The third singing synth one modified it a bit, it was noticeable the voices got more "hi pitch" BUT it wasn't really that much as most people would say.
About the engine noise, I really can't see where everyone says that "UTAU" makes a engine noise that "OMG it kills me." really, only the oldest version of UTAU did the robotic sound...Resampler is a very good resampler btw, not the best, it does add a bit of engine noise, but again, I dont think it is that much. The only thing that I noticed did influences is the mic quality in which it could make a voice unplayable at times or with a very bad quality....just like UTAU does actually!

About the talking synths none of them modified the voice tone that much. I couldnt even tell if I didnt paid real attention.
--------------------

Sorry for the wall of text, I am almost done. So, I am now really confused whether or not a vocal synth really modifies a voice, or what can do this modification, etc. I am sure all this may varies depending on how the synth is programmed to work and etc, but maybe some have answers for an specific synth? I'd like to hear.

I did some research on internet and coincidence or not, most of the sites I visited cited the "VOCALOID Luka case" in which her voice was meant to sound sensual and it sounded way different than what their creators meant. I see most people only use this "fact" (I dont know if it is true or not tbh, and the VOCALOID wikia is not the most reliable source always) as a base and throw a reply like "if you import x vocal to x synth it will sound different cause it happened to VOCALOID2 once, and with only one vocal and only one type of vocal.


So, I am really curious rn and would like to hear different answers....Thanks!!! (btw sorry for the walls of text, I will pay the doctor to heal your eyes! I know they must be tired ;A; )
 
Last edited:

kimchi-tan

Your local Mikotard
Global Mod
Defender of Defoko
Moved this thread to Singing Robots General since it is more appropriate here and because this thread is not "irrelevant" :smile:
 
  • Like
Reactions: Mougeki Mero

na4a4a

Outwardly Opinionated and Harshly Critical
Supporter
Defender of Defoko
Different engines don't really have noise...more just distortion (which can be interpreted as noise but isn't really the same).
Any "noise" you hear is often just a result of the voicebank itself being low quality and the engine is just making it more obvious. It usually doesn't take a lot either.

Whenever you manipulate audio some sort of side effect is going to occur. This is even more the case with human vocals because the voice is "different" and cannot be handled the same way as nearly anything else. One of the main things is the formant of the voice which affects the perceived gender and quality of a vowel sound (more or less).

Each engine/resampler processes vocals in a different manner and thus the distortion or otherwise unwanted sounds are different as well.



In the case of Resampler is usually a squealing sound that cannot be removed with flags. You also tend to lose a certain "richness" of the voice which makes them sound weaker and shriller than the original sample.

Fresamp11 tends to have a weird formant filter that makes voices sound strange, you can usually fix it with the F and L flags (ex: F2L2 is a general combination that works well). But vowels tend to be very smooth.
Fresamp14 will render many voices with a "rough" and "bumpy" sound and also over exaggerates the breathy part of the voice.
Both version will start to squeal like mad pigs if you push a voice too far.

Tips and tn_fnds sound nearly the same, they arguably are very truthful to the sound of the voice but Tips tends to pop/ping when it crosses over different sections of the sample.

Moresampler doesn't use the original sample at all but instead a model of the sample so you really can't compare, however as you shift a voice up in pitch the breathy part tends to become disjointed and obnoxious. But this is often only in extreme cases and with lower male vocals. It also requires you to manually fix it's mrq file sometimes or it will render incorrectly and ignore the tonal part of the voice. (not a bug, just a matter of the pitch analysis not being able to contend with every voice)

The list goes on.



All in all, any time you manipulate audio something has to go "wrong"...it just depends how obvious it is and if you are okay with the result.
The less you do to a sample then the less negative effects there will be. Feeding a sample though a resampler/engine without doing anything to it will pretty much give you back what you put it, it won't be exact but most of the ill-effects come with heavier manipulation.

No one really knows the whole story with the case of Vocaloid since it's proprietary. Vocaloid 1 was more analysis bases where Vocaloid 2 and beyond actually contain the samples within the ddb files. What it does to process these samples is an enigma...Well actually there is some information but it's very vague.. Either way it too has it's own distortion or effect on the voice which can be heard if you listen closely enough and also get the chance to compare the original samples, but they probably have put a LOT of money into making it perceptibly less obvious...with different trade offs.

EDIT: almost forgot. The case of voices not "matching" their intended tone is often the case of it being acted wrong or a lot of the sound of someone's voice being derived from their intonation. Vocal synthesis is difficult and recording is sometimes unnatural so the results vary a lot. Vocal synthesis will capture the tone but not the pitch fluctuations and fine nuances of their voice.
 
Last edited:

수연 <Suyeon>

Your friendly neighborhood koreaboo trash
Supporter
Defender of Defoko
I think a good comparison for how engines can change/make a noticeable difference from the "true" voice are voice providers who either had utau (nostraightanswer, Misha, akyglancy), sang duet with their synth (Dahee, Lia) or covered their respective synths songs (Asami Shimoda, Yuu Asakawa).

Common differences between the human and any synth: lack of dynamic and tone shift without multi-pitch or parameters like that of cevio/alterego, lack of velocity differences by default.

Common differences between human vs utau: synth tone can be deeper. More true to the source.

Common differences between human vs vocaloid: synth can sound noticeably thin, even nasal - compare Gackt to Gakupo, Kenji-B to Dex, IA to Lia, Luka to Yuu... Most glaring example would probably be Hachi vs Ruby vs Misha. Misha presumably used the same tone for both libraries but Ruby's nasality was deemed less palatable to the ears where as Hachi was fairly popular.

I can only go with the most... layman of ways to describe the differences as I hear them. We don't have access to vocaloid development kit and years' worth of PHD knowledge to draw from after all. Not to mention that not all voices respond the same way to a given resampler.
 
  • Like
Reactions: Mougeki Mero

Mougeki Mero

Defoko's Slaves
Defender of Defoko
Thread starter
Thanks for both of your answers @suyeon and @caustic!!!

Well, there are still one question I have and I also think I should reply to your replies XD


@Caustic: Woah, that is a 100% better answer than those I got/saw on the past. With UTAU I had some idea on how this happened specially because it relis on resampler tools, and different ones exist made by third parties. Althought like you said there is some change to the voice, I dont see the "oh god the voice is ruined" difference...What I mean is that it doesnt sound like a big change to the voice tone itself.

About the voice acting that was one of the most annoying things I heard on the past, the replies weren't as specific as yours, just a generic "it will/wont be compatible" reply. Thanks for your reply o/ Really informative!
BTW, I've made a tutorial and a video "discussing" the resampler's modification on UTAU, I think I should edit it and post soon...! It is really informative and most probably helpful tips for UTAU users;

@Suyeon: I forgot about those, and specially Dahee. The difference is really big between the Seiyu and the voice in the vocal synth...Althought I think this big difference is mainly due to she not using her normal voice when recording for VOCALOID; However those aspects you listed are indeed noticeable: specially the nasal aspect, that is something I noticed too! However I dont see how this could change the entire sounding of a voice, like with the "Luka case". Unless the people who tested Luka's "sexy" voice used the voice higher, out of her range...
----------------------------------------------------------

Well, I still have another question: I decompressed a voicebank of a vocal synth, and thus I noticed the nasal difference from the raw sample and its rendered version, the tone was pretty much the same, no big difference. If I were to look at the most minimal details I'd say this voice get a more higher pitch/younger when rendered, but the difference is really unoticeable. So if I am allowed, I'd say that drastic changes on the good vocal synths (specially paid ones) is that the drastic voice tone change is a big rumour for now???

-----------------------------------------------------------

On a side note, I think it's much likely possible to use UTAU to make a "near perfect" voicebank that is unnafected by the deformations (even if they are really small). However I am sure it'd require each sample to be recorded on each pitch (a "ka" sample for C3, C#3, D3...etc). And because of oto limits I think this "dream" will already die;;