I can maybe explain a few things that might help you make better choices when recording/configuring your bank.
The way (I believe) UTAU works is that, it takes your recorded syllables and lines them up as you lay them out on the piano roll, then "vocodes" them (as in, uses a vocoder) with the "resampler" as the carrier wave, on the correct pitch.
So there's two sides to the process:
1)
One side deals with the samples. This includes
recording them, and
blending them together in the piano roll ("crossfading," which is what your OTO.INI is for). Manipulating the samples is what will make your bank sound "natural" or not.
2)
The other side deals with the vocoder. This includes choosing
pitch on the piano roll (as well as using "pitch bend" functions), and the
Resampler. Since laymen generally can't create their own resampler/carrier wave, and would know what would be best even if they could, you'll probably just select from what you can find.
I won't go into
recording tips, since a dozen and a half people here can give you those if you ask. Both good and bad advice float around these forums, the vocal-synth community, and the population of all musicians. What's important to understand, however, is that any noise that makes it into the recordings will also be picked up and pitch-changed by the resampler/vocoder!
So if there is a lot of background noise, then that noise will be played along with your sample at what ever pitch you put into the piano roll. If there is a cat meowing in the background of your sample, it will be pitched and played too.
As for
blending them together, I don't have a lot of experience configuring oto.ini files for UTAU, and you're better off asking the people here. But I will say that if your preutterance ("Attack") on a sample is too long, you'll feel some lag on that sample-- that's what I'm hearing in your demonstration linked above,
@Kurara.
Your preutterance and (post-utterance? I've forgotten what UTAU calls it)
determine how much of a sample crossfade with it's neighbors by default. You might even be able to get by without using an oto.ini at all, and just crossfading all syllables by hand, but that kind of defeats the purpose of UTAU, and would be a pain not only for you, but also for anyone else who might deign to use your bank.
Laying
notes out on the piano roll is pretty straightforward, whether you're using a prefab UST or are creating one yourself. What might be less straightforward is how to use pitch-bends effectively. It isn't needed on every note, and often is
best applied as very subtle predictions used right before notes, or little relaxed bits at the end of long notes-- like a pop singer who
slides into the pitch of every line instead of properly jumping right onto the "correct" pitch. My voice instructor would have have had a stroke if he heard me telling you to slide into pitches! But that's the method for pop music.
As for the
resampler, you can think of it as the "vocal cords" of your UTAU. Yes, the inflection and personality all come from you, the recording artist of the voicebank's samples, but the
tone of it comes from what resampler you choose. In reality, it is just a waveform that moves to whatever pitches you've laid out on the piano roll. It plays faster for higher notes, and slower for lower notes, just like your real vocal cords vibrate faster or slower dependent on pitch. Each resampler has a slightly different composition (which is to say, the carrier wave of the vocoder is a slightly different wave), and in practice
this will highlight and draw out different parts of your samples, create different overtones, and just generally have different sounds.
So, if you're just using a glorified vocoder, why does the range of a voicebank matter? Well, a vocoder will resonate with the samples you're using... rather, the farther away the pitch of the vocoder (that is, on the piano roll), from the pitch of your samples (the pitch you recorded at), the less strength the product will have. If you match the vocoder pitch and the recorded sample pitch, it will sound almost natural-- like it's your singing, but through someone else's vocal cords.
(of course, tell me if I'm dead wrong, I may misunderstand how UTAU works... orz)