Advice for a Classical Voicebank

Charis

Momo's Minion
Hi all,

I am a new Utau creator and am looking for some advice on something (seemingly) a bit niche.

I am a trained classical/operatic singer, with a naturally high voice. I haven’t seen too much advice regarding recording for different voice types, and what I have read is inconclusive. I’ve done quite a bit of reading, but nothing I've found has really been helpful enough to my situation.

Context specs:
  • Recording on a Blue Yeti (not necessarily the best for high ranges, I’ve read, but still among the better USB mics)
  • Using a Mac (Utau-Synth)
  • Recording with Oremo (with BGM)
  • Recording a CV voice bank to begin with (but interested in other kinds)
  • Aiming for a natural-sounding voicebank with a high range
My first question is whether or not to record with vibrato. My vibrato is strong and more natural to sing with than forcing straight portamento. However, I read from one source that vibrato can throw off otoing and cause unwanted noise, advising against it. On the other hand, I read from another source that recording with vibrato can make a bank sound more human. Does anyone have advise on what would be better?

My second question is regarding what pitch to record on. I did my first recording session using the standard F4 BGM in Oremo, but it’s actually fairly low in my voice. I recorded another version in C5 (so I could still harmonize with the BGM), which is more comfortable for my voice, but does increase the frequency. I’m pretty sure that in many of my recordings the waves are too big (tall) and may cause distortion or clipping, so I’d have to be more careful and move the mic farther away from me, I suppose?

Finally, should I be keeping anything particular in mind when otoing? I haven’t really touched otoing yet, aside from reading about it. I understand that Oremo has the Generate Oto feature (that isn’t really that great for CV’s, apparently), but would it still be advisable to use that feature and then fine-tune things? Or, is it better to let Utau oto it upon import (it does do that, right?), followed by fine-tuning? Also with otoing comes frequency mapping, which is apparently not a feature in Utau-Synth. Does anyone know otherwise, or another way to do this on a Mac?

I have goofed around a bit with my prelim banks (one recorded lower with vibrato, one recorded higher without), and I put both through Oremo’s Generate Oto feature to test what they sound like in Utau (neither with further tweaks). Both sound fuzzy and lack clarity, but I’m not sure if that’s normal or not. The raw voice recordings sound clear, so I’m not sure if it’s an issue caused by the otoing, or what. Could anyone let me know if I’m on the right track or not? I’m a total amateur haha.

Thanks so much for any advice; forgive my cluelessness!
 

Sors

Local Guppie & UTAU Korean Advocate
Tutor
Defender of Defoko
Hi so about Vibrato: no, record without it. Really not recommending it, it can **** up the samples tbh and give not so great results.

About the pitch, you'll probably wanna go for F4 if you only record one pitch. See UTAU - well, any vocal synth - is horrible at pitching samples down. And while it mostly depends on what songs you are gonna cover or make, C5 will not really help you for Songs that are usually mostly below C5.

For Otoing, please, please make sure you have at least 1 second of silence before the sample, and place the preutterance correctly. There are multiple oto guides and a lot of people take oto requests and comissions.
 

Nohkara

Pronouns: He/him
Supporter
Defender of Defoko
When you record an UTAU voicebank, you must not vibrato your samples. Raw recordings must be recorded with stable pitch.

The vibrato is added into UST* afterwards when you tune it.

For a good sounding results, try pronounce things how you would when you sing naturally. If you record CV, make sure that there is about 1 second silence before and after actual sound (after because you might want oto endings as well called commonly as “ending breathes”). For good sounding CV, it’s enough to record vowel part for 1 to max 2 seconds long steady.

*UST file is like MIDI file that is exclusive UTAU format to save projects. UST will store project’s notes and it’s length and pitch.

Other than that, UST will store information like what voicebank used, what resampler (on Windows ver), what flags, each notes’ lyrics and where and how pitchbends and vibratos were placed on which notes.
 

Awaclus

Ruko's Ruffians
Defender of Defoko
I’m pretty sure that in many of my recordings the waves are too big (tall) and may cause distortion or clipping, so I’d have to be more careful and move the mic farther away from me, I suppose?

If you move the mic farther away, it will pick up less of your voice but it will still pick up the exact same amount of room reverb, i.e. the samples will effectively get roomier, which you don't want. It's better to adjust the mic's input volume to get recordings that aren't clipping.
 

Charis

Momo's Minion
Thread starter
Hi so about Vibrato: no, record without it. Really not recommending it, it can **** up the samples tbh and give not so great results.

About the pitch, you'll probably wanna go for F4 if you only record one pitch. See UTAU - well, any vocal synth - is horrible at pitching samples down. And while it mostly depends on what songs you are gonna cover or make, C5 will not really help you for Songs that are usually mostly below C5.

For Otoing, please, please make sure you have at least 1 second of silence before the sample, and place the preutterance correctly. There are multiple oto guides and a lot of people take oto requests and comissions.

So, would you say that Utau is better at pitching samples up, then? See, again, C5 is fairly mid-low in context to some of the songs I will be making. Do you still recommend recording on F4? Thanks very much for your help!
[doublepost=1562787084][/doublepost]
When you record an UTAU voicebank, you must not vibrato your samples. Raw recordings must be recorded with stable pitch.

The vibrato is added into UST* afterwards when you tune it.

For a good sounding results, try pronounce things how you would when you sing naturally. If you record CV, make sure that there is about 1 second silence before and after actual sound (after because you might want oto endings as well called commonly as “ending breathes”). For good sounding CV, it’s enough to record vowel part for 1 to max 2 seconds long steady.

*UST file is like MIDI file that is exclusive UTAU format to save projects. UST will store project’s notes and it’s length and pitch.

Other than that, UST will store information like what voicebank used, what resampler (on Windows ver), what flags, each notes’ lyrics and where and how pitchbends and vibratos were placed on which notes.
Thanks for your advice!
[doublepost=1562787150][/doublepost]
If you move the mic farther away, it will pick up less of your voice but it will still pick up the exact same amount of room reverb, i.e. the samples will effectively get roomier, which you don't want. It's better to adjust the mic's input volume to get recordings that aren't clipping.
Thanks for the catch; will do.
 

WinterdrivE

Ritsu's Renegades
Defender of Defoko
Definitely seconding everything above, especially with regards to vibrato (ie, don’t) If you record with vibrato, you’re never not going to hear it, and you lose the ability to change its timing, speed, and depth; it’s just a permanent fixture in the voice at that point. Combine that with the fact that UTAU also has to stretch recordings to make them longer if needed and you wind up with even more problems. So yeah, no vibrato.

Regarding the pitch, yes, UTAU and vocal synths (and audio in general) tend to pitch up better than down. That said, if you’re only doing one pitch, I say go for whatever is most natural and most closely achieves the tone you want.

Another option (and one that generally yields better results), however, would be to record multiple pitches, which will better capture the tone and expression of your singing across a wider range. If you’re starting out with one pitch, which is totally fine, I say do whatever’s most natural. If that happens to be C5, then it’s C5. (Notes below this will likely start to sound buzzy and thin, as usually happens when recording are pitched down, but that’s par for the course) But if you have the time and are up for it, I say go for multiple pitches and eliminate the ‘which pitch’ question.

[edit]
Oh yeah, also definitely seconding and stressing what Yuki said above about trying to pronounce and record things as naturally as possible. It can feel weird since you’re usually recording gibberish on a single pitch, but try to sing the recordings as you would if they were part of an actual song. After all, the recordings will be used for singing actual songs.
 
Last edited:

Similar threads