Have you had a look at arpasing?
http://arpasing.tubs.moe/en/index.html
Whilst it's a very new recording method, there are lots of resources and different reclists to help you out, and I think it might work best for you since some of reclists consist of actual words, so you can just pronounce them the way you would, instead of trying to force out particular sounds, as you might with VCCV or CVVC lists. This does mean your UTAU will likely have a noticeable non-native accent, but I don't think that will cause any issues; I personally know a fair few English speakers from all over the world with a variety of different accents!
Of course, I don't know how easy it will be for you to pronounce the words in the reclist (I know when I was learning French a few years ago, there were some words I just couldn't) so you may find you want to use a list which is just a string of sounds. You could look at VCCV, which is the other standard for English atm, but it's fairly difficult due to the sheer size of the voicebank alone, but Kohaku Merry has a VCCV English voicebank, so I wouldn't rule it out if you struggle with arpasing.
Sorry this has been pretty long, but I hope it's at least a bit helpful; UTAU English is pretty difficult in general, so please don't hesitate to ask if you need any further help, because, if not me, somebody will know what to do