    Note: upon review it has come to my attention that this resource may come off as extremely condescending, and that many people already know this. But whatever.

    This is a really, really common pronunciation issue in Japanese VBs made by English speakers. It's also really, really easy to correct!

    The "t" sound usually used in Japanese is slightly different from the one used in most English accents.* While in both languages it's considered a "voiceless alveolar stop," in Japanese, the "t" is considerably less aspirated than in English. This means, more or less, that it sounds much less like the "ts" sound.

    *Stereotypical New York City is probably furthest.

    Here is a short clip I recorded to demonstrate the difference. (Also, to demonstrate why I don't make my own UTAU. Sorry for my horrible microphone and dead-sounding voice.) The first pronunciation for both pairs are Japanese, the second American English.
    You can easily see the difference in the spectrogram:

    Note how in the AmE "te," the "t" sound has a broader frequency range, including up to about 10 kHz, while the Japanese "t" peters out at around 6 kHz.

    To stop doing this, try to avoid pushing air between your tongue and teeth (technically your alveolar ridge :wink:) when you enunciate the "t." Please reply if you need more clarification because it's really hard to explain this in text.

    Note that Japanese sometimes does use a more aspirated "t" sound for effect! Gakupo's Power_V3 append is a good example of this: most of his "t" sounds are aspirated, to varying degrees.
    Real singers may use it in rock songs, especially in parts they want to emphasize - note how in this cover, at ~0:42, Majiko uses an aspirated "t" for the "to" in "tomete," which is part of the actual verb stem, but not the "te," which is part of the conjugation. In general, that video makes a great exercise in differentiating between aspirated and unaspirated "t" - try and note all the times she uses aspiration, and the part of the word that she uses it in.

    One way you might implement this is creating additional phonemes that use aspirated "t" labelled using romaji or katakana.

    Additionally, the level of aspiration in normal speech/singing depends on the following vowel and the specific person's accent, but that's getting a little too deep for me.

    Disclaimer: I'm not an expert in phonetics, so some technical information may be wrong in this. I'm not even an expert in Japanese pronunciation, just this one very specific part of it. So don't call me out on how I said "トイレ" wrong or something.

    Also, once you know this, it sticks out really really badly. In, like, 90% of Western voicebanks.
