VST! VST! VST!
I wish we could run Vocaloid in more DAWs than just Cubase.
Really the most important thing is being able to run Vocaloid as a VST in all or much more DAWs than now, this is only way the Western pro music community will really begin to take Vocaloid seriously. It needs to be made easier to tune somehow too- right now I think realistic tuning can be just as time intensive as the human singer approach of recording and comping, and unless you want a robotic feel or just really like a certain character or their voice, as the current fan-audience does, it can't be taken seriously. This goes especially for English, I think it is not as much of a problem for the Japanese, as the language seems to have been handled better, and the culture seems to prefer robotic voices with Anime characters, but in the West Vocaloid ends up competing with human voices in Autotune.
If there was some sort of built in elision for more appropriate pronunciation for singing that was default, I think that would help English a lot. As it stands, English P&P, even with the correct dictionary pronunciation that say Cyber Diva and some of the new Engloids now have, is easy, but very stiff and lifeless. It ignores co-articulations that happen commonly and naturally while singing such as [e@n] [dh aI] for 'and I' instead of [e@/{nd][aI] or [f j u] for 'If you' Some consonants are easier to sing or sound better to sing a note and the end of a note, and others are better attached to a vowel at the beginning of a new utterance, this is especially important for causal speech and singing, and doubly so for music like pop or rap, which is meant to be sung like casual speech. Crisp pronunciation can feel cold, lifeless, and robotic, which is great for a certain feel, and people have done a lot with that already, but I know Vocaloid is capable of more than just that.
I think and option to switch crisp pronunciation with more causal/smooth singing in one click would go a long way for making English Vocaloid more P&P and easier than recording and comping. It not only has to be easy to work it needs to sound like it would be easy to do as well, and deliver.