UTAU can be tricky to get right when you're first starting out, but don't feel discouraged!! it just means you have some more to learn!! owo/
the problem here is most notably the oto, or what sounds like a lack thereof. otos tell the program when the consonant should be uttered, when the vowel should begin, and when the vowel as i like to call "stabilizes." without an oto, the program will register the entire sample as the "stabilized" vowel, meaning depending on the resampler you use it will stretch or loop the entire thing regardless of whether there's silence, consonants or not.
kiyoteru has made a pretty good oto guide
here, but the main things to remember is that the green line comes at the beginning of the consonant, the red at the beginning of the vowel, the pink when the note "stabilizes," and to remove the silence in oto so there aren't any pauses during the note used. once you add an oto to your utau, it will begin to sound like proper singing! owo/