It dosent have an oto.ini file btw
Load your voice bank into UTAU and then go to Tools > Voice Bank Settings (or press ctrl + G).
It will open this window. Clicking Launch Editor will open the editor at the bottom, where you can see the waveforms.
The oto.ini file tells UTAU how you want the samples to be processed (in a way).
Offset (light blue/purple): Where you want the actual sample to start; use it to remove silence (and other unwanted sounds) at the beginning of the sample.
Consonant (pink): This area tells UTAU not to modify this parts of the sample; it's often used for consonants at the start of the sample.
Cutoff (also light blue/purple): Where you want the actual sample to end; use it to remove silence (and other unwanted sounds) at the end of the sample.
(Edit: The white area is what UTAU's resampler will stretch. It's often used for vowels.)
This part is a little confusing... there are people that can explain this better than I can.
Preutterance (red vertical line): Adjusts the timing of the sample (in the image, it's put where the consonant ends and vowel begins).
Overlap (green vertical line): Where the sample crossfades from the previous note (in the image, there is no overlap as it's right where the offset is).
I think that, because your voice bank doesn't have an oto.ini, UTAU is stretching the entire sample to the point that you can't hear it. Whatever silence is at the beginning of the sample will be stretched too, because there are no settings to tell UTAU what to do.
1-Theyre .wav and i dont know what the other things mean
Depending of what audio editor you are using, you have the option to export .wav as 16, 24, or 32 bit (or more). The current default resampler reads 16 bit .wav files.
(I think it has to do with quality, but I'm not an expert on this stuff.)
Mono is the opposite of Stereo. UTAU can read stereo samples, but you can have weird results (one of my old voice banks sounded like a dying cat because it had stereo samples instead of mono samples).
Stereo: 2 channels; Left and Right
Mono: 1 channel; Center
I hope all of this helps in some way.