The common downside to both is that they're mainly aimed at Americans and rely on having a General American accent
Future plans for Arpasing include a British specific list, which is why the 'ax' phoneme is officially supported (though not truly implemented in any current lists, to my knowledge).
However, since Arpasing is a more open method than VCCV, you could easily try to write your own lists instead of waiting for Kanru. (Though, if you're proficient at writing your own lists, you might as well not be basing it on any standards to begin with LOL)
(a 2-3 syllable word can become 5-6 tiny notes).
I would argue that Arpasing can actually be worse with this problem. Arpasing works exclusively on diphones, that is, each note only has two phonemes. Therefore, in order to reconstruct clusters, you must have a note for every single transition. VCCV, however, does have the occasional 3-phoneme notes, such as the "str" cluster. But you can probably safely get away with not doing USTs in the "perfect/standard" method, and omitting transition notes for the sake of clarity.
Of course, this is all down to usage, any additions made, and ust editing.
Absolutely. They're both lists of the same language. What you put in is what you get out. For example, Delta's CVVC list may seem like it lends voicebanks a Japanese accent. However, that's because most users of that list have Japanese accents. When a native speaker records it, it sounds like the accent of a native speaker.
Though, one of the concerns to be had with VCCV is that the default BGM involves recording at 120bpm. (This was for the sake of easy calculations in OTO values. It's not fun trying to do math with weirder tempos!) At times, it may be too fast to record with a singer-ly tone. With Arpasing, you have the freedom to record at whatever speed you like, as Moresampler is analysis-based and not strictly based on regular timing. So, if you want to take more care in your pronunciations, you may want to turn BGM off.
Good luck, and I hope you figure out which list works best for you.
EDIT: I forgot to mention something! If you're recording voicebanks with someone who doesn't use UTAU, then you'll definitely want to choose a shorter list to keep their interest (especially if it's an easily bored child).