This is what I've been able to see.
1- Recordings
Some recordings have minor flaws, but in general, the problem is that not all recordings have the same volume, which can affect the UTAU and make it sound strange in general. Some recordings also have the wrong consonants, causing problems such as sounding too airy.
2- Frequencies
You hadn't generated the frequencies, but I did, and I've been telling you they have errors. These can be fixed with edit freq map > select the malformation > vrq.
3- Oto ini
The oto ini appears to be generated, which is bad for the CV banks. The oto ini lacks the use of "overlap" and makes everything sound very robotic.
4- It is not recommended to record appends with the "Base" voicebank.
It's preferable to create appends as separate voicebanks, but if you want to combine them, it's best to do it like in Windows 100% UTAUs, where you added a folder with the append (e.g., "S" for "Soft"), and in the OTO ini, you entered the phoneme plus the "S" so you could use them. However, it's much more convenient to create them as separate voicebanks, as it also makes them more difficult to use.
What I would recommend is to redo the OTO ini and separate the appends by creating separate voicebanks. Also, if you want more versatility in the voicebank, you can add pitches using BGMs.