I am going to add this in first, then the main post.
The fact Utau still works isn't by luck, being written in VisualBasic6.0 it was written on a legacy and unsupported language from the start. VB6 was discontinued in 2006.
But Utau still works because Microsoft chose to keep it compatible it still and should probably still remain so for up through the next Windows release.
That being said, Utau is still on an old codebase regardless and still very buggy and is in need of a successor.
Imo, people shouldn't aim for an utau-equivalent, but something superior.
One major flaw in see with attempts to replace it is that they end up limiting themselves/holding themselves back trying to be a 1:1 drop in.
We need something that is backward-compatible so we can load old utau projects and voicebanks but maybe not replicate all the functions exactly.
For example the ability to parse an oto.ini but not edit it because it can be considered "legacy".
And you can do that without dropping support for things like resamplers.
Some ideas because I suck:
-Basic interface for early versions. "keep it simple stupid", don't cram all the functions in buttons on the top and sides. Add functions as you go, not all at once.
-separate envelopes and volume/dynamics editing. Because as we all know, as soon soon as we start editing envelopes the smoothness of the render can be affected. Also it's cumbersome.
-continuous note editing/tuning. Since separating each note slows things down immensely.
-midi-style pitch editing, maybe with control points and bezer curves on top of free handing.
-automatic note spacing/linking. Be able to ignore slight gaps between notes while also removing the need for rests.
Superficial things like logos and themes should be at the bottom of the list and functionality and stability should go first.
...and don't bother with mobile versions, phones and tablets aren't fast enough generally and unless you are also willing to make an engine from scratch to be compatible it's probably not worth it
If possible it would also be a good idea to discuss things with people who make or have made backends (the resamplers and wavtools) to get their opinion on how aspects such as commands and temporary files should be taken care of.
Also another idea is to make sure it's inherently multiplatform, without the need to make multiple versions in the future.
You could use Mono (mono-project.com) and an open source GUI framework like Gtk to build the front-end portion at least. Just throwing stuff out there.
Just random stuff thrown out there, some stuff I only mention because it's always brought up in threads like this.
EDIT: Other projects like Cadencii, OpenUtau, ect doe exist but it may not be the best idea to use those.
Open Utau is an interface with no functionality (other than already being able to open UST files) so that may be an alright option but Cadencii surely wouldn't be. Being more of a way to use Vocaloid 1 and 2 in the same UI, Cadencii is already quite bloated and clumsy. You'd have to do a ton of scraping and throwing away most likely.