I've received the samples you've sent me and there are quite a lot of problems that make them near-impossible to get good results with in UTAU. These are generally resolvable by the voicer who recorded them, but I want to bring them to light because it's useful information and because some of these issues are beyond simple technicalities.
In this day and age I have NO CLUE how someone managed to record audio with a sample rate of only 8kHz. I would recommend using a USB microphone. Files for UTAU need to be mono channel wav files with a sample rate of 44.1kHz and bit depth of 16, which are the typical default settings in most software. Voxengo r8brain can be used to batch-convert the sample rate.
The filenames initially lead me to believe that each file would be the single phoneme by itself, but it turned out to be something more like "nanan". However, all of the samples were quickly spoken. UTAU is meant for singing vocals, so the samples must be sung, and sustained for a longer period of time. This gives more information to the length stretching algorithms of UTAU's resamplers, so that there will be less artifacts and distortion in the resulting vocal.
Single phonemes will not blend together and transition smoothly. Since Defoko isn't working properly, take a look at another existing CV Japanese voicebank. Rather than samples like "k" "a" "i" "u", there are separate "ka" "ki" "ku" samples. This is what my previously linked list had: combinations of each consonant with each vowel. Klingon's phonotactics, as noted in the other link I shared, have a maximum syllable structure of consonant-vowel-consonant. UTAU cannot support full CVC syllables and stretch the center vowel as expected, because it can only stretch the final phoneme of a given sample. Therefore syllables are broken down into their initial and final components. The final components are VC rather than the consonant by itself, because the vowel allows it to blend into the syllable as a whole.
Simply put, this isn't usable at all. If I find some free time, I will rewrite my list according to standard transcription of Klingon, add some other samples I think would be useful, and perhaps record the samples myself strictly for testing purposes. In the meantime, you would do well to learn about how existing UTAU voicebanks are structured, and how to apply the same old concepts to a new language. For example, English voicebanks also have to handle consonants at the end of syllables. I would be willing to explain all of these necessary concepts, but you are also welcome to join my Discord server where dozens of other people are also discussing UTAU.
https://discord.gg/rSzZD9P