It's similar to ENUNU or Diffsinger (currently a bit closer to diffsinger I'd guess). But it's also just significantly more mature because it's older and a paid software with more than 1 full time developer.
The AI part tries to replicate the dynamics, pronounciation, timing, and pitch of the vocalist; so yes, that's the bit that makes it sound realistic. The vocoder and good voicebank development practices also help, though.