VCV
Transitions/Smoothness
Since a VCV segment contains the entire transition from one syllable to the next, by default, it will be smooth. If you place the overlap too deeply into the transition, you might lose some of that, so the safest place to put it is within a consistent area of the previous vowel. The only thing you have to worry about is how the vowels crossfade together, and even then, there's Moresampler (which uses data from a file analysis to make a smoother transition than editing the audio)
Ease of Recording
Recording takes a long time, and it can be confusing to read the samples. However, if you are fluent in reading kana, have an extremely efficient reclist, and use OREMO with a guideBGM, you can knock out a basic japanese VCV bank in less than 15 minutes.
Ease of Use
With japanese, you'll have to deal with converters. There's five ways I can think of: AutoVCV, which is part of UTAU-Synth and Shareware PC UTAU; the old CV to VCV plugin; the AutoCVVC plugin, which has VCV as a mode; the bizz plugin, which includes VCV conversion; and Presamp. AutoVCV and presamp require no editing other than ensuring that the UST is in hiragana.
When it comes to other languages, oh man, using VCV is so much fun. Depending on what samples and clusters are included, you can make smooth stuff so fast. Sure, you still have to type things manually, but try out a bank like Adrian and doing english is such a fun process. With less notes to contend with, the file itself is less cluttered.
Ease of OTO
VCV otos are very simple, but very tedious. If you use OREMO to record hiragana with a guideBGM, you can use the built-in generator and get pretty accurate results. With Tady's generator, you can make aliases for any vcv reclist, if you set it up correctly. However, you still have to go in and adjust everything yourself. Japanese is 7 times the length of a CV voicebank, and any other reclist will be hellishly longer. So much so that VCV reclisters for other languages frequently have to worry about the OTO limit.
CVVC
Transitions/Smoothness
CVVC is based around using CVs and VCs to augment them. There are two main types of VCs, the transitional type and the stopping type. The stopping type are pretty much just used for languages that need to have consonants at the end of syllables, so with japanese, you only need to worry about the transitional type. If not OTO'd correctly, you'll have the pitfalls of a poor CV, as well as VCs that could possibly sound awkwardly forced and make the "double consonant" sound. However, if well-configured, CVVC has the same smoothness as VCV, while offering more flexibility in the timing of the transition itself.
Ease of Recording
CVVC reclists need not to be strictly hiragana, as they're typically aliased by hand into kana anyway. This makes them a lot easier to read. Depending on the list, the recording time can be a bit more than CV, or a million times less. It's great for busting out tons of quick voicebanks.
Ease of Use
Use can be tricky compared to VCV. You need to make a lot of small notes in order to construct the transitions, which can get the UST pretty cluttered up. However, the tiny notes could be taken advantage of for the purpose of tuning, and by adjusting the proportions of the lengths, you can artfully change the sound of the transition for effect. Presamp was made for CVVC, so it can automatically turn a hiragana ust to cvvc without you needing to edit it all, a very big advantage. The autoCVVC plugin does occasionally glitch, but it allows easier customization.
Ease of OTO
At the most, CVVC japanese could be something like 3 times the length of CV (disregarding extras). You need to have a good knowledge of OTO theory in order to effectively do both the CV and VC parts, but once you got it down, the OTO is done pretty quickly. You might have a base oto for aliases and some values depending on the reclist, but there's no way to generate it the same way as a VCV.
Is CVVC in any way superior in quality to VCV, or vice versa?
In terms of the end result of using the voicebank, I believe that CVVC has more expressive qualities than VCV. When recording voicebanks, it takes less time (even when VCV has a potentially very efficient workflow), so the samples will be much more consistent. This opens up the possibility to make more sets for the same bank in less time, so you could easily make multipitch or something. VCV has been popular, but mainly because it was the second type of bank developed, not necessarily because it was better than CVVC (which wasn't around for UTAU yet.) While it is fun to use, you are pretty much locked into what the bank gives you (unless you add CVVC otos to it), so you can't (for example) make a certain consonant sound naturally slow when it was originally recorded to be fast.