That's... really weird. How did you generate the .frq files for the voicebank? And does this happen with only the ust you're using, or with any ust?
I think a link to the voicebank would help too, it's hard to figure the problem with just the sample and no other info ;o;