The next generation of UTAU - changes we'd like to see

Axiomatic-Vertex

Retired User
Retired User
Defender of Defoko
Hi!

I'm here to discuss some features and changes that I, and you, would like to see in the next major version of UTAU. I am aware that this subject has been discussed before, and I would like to bring it up again.

So let's begin.



Change 1: instant rendering
yes obviously this would be fantastic

I think this can be achieved in the same way that you make voicebanks in DeepVocal - that the voicebanks should be built or compiled into one package, in a similar way that you'd compile a program into a ready-to-use .exe file. This would allow for instant and responsive playback, as the rendered data has already been made.

I'm not sure how you would go about testing the voicebank for errors or configuring it in the same editor - the only way would be to keep the voicebank's project and change that accordingly, then compile it again and put it into UTAU to test with. If you know what I mean.

One problem that could arise from this, however, is that you'll be limited to how many resamplers you can use. If a system like this does get implemented hypothetically, you'll only get to use the new resampler that's included with UTAU as it would be designed for this new system. You could only hope that the resampler does a good job at resampling.
Heck, Moresampler could be implemented somehow! (that would be good tho)


Change 2: UI
yeah a big UI change would be great

I spent almost 2 hours on this new UI mockup:
upload_2020-8-28_13-14-33.png

It's not terribly great, but there's a few improvements over the UI we already have!

I've made the UI mockup in MockFlow. You can register for free - however you'll only be able to make one UI mockup.


-2.a: Notes-

Ok so number one:
NO REST NOTES
ok that's it

Secondly, the new UI should support CV, VCV, CVVC/VCCV and Arpasing/Cz lyrics and display the phonetics for the notes accordingly, like these mockups:
upload_2020-8-28_10-34-34.png
VCV

upload_2020-8-28_10-36-25.png
CVVC

upload_2020-8-28_10-46-26.png
Arpasing

upload_2020-8-28_10-53-12.png
Cz English

You should also have the ability to toggle this phoneme view on or off, and additionally for VCV you can toggle rest notes on or off as well.


-2.b: Toolbars-
fairly simpler than the toolbars we already have:
upload_2020-8-28_13-1-41.png


-2.c: Envelopes and pitch bends-
Envelopes will be sitting on the bottom of the screen as if they're parameters in other software like Synth V:
upload_2020-8-28_13-2-24.png

There's also checkboxes for turning on or off pitch bends, VCV rest notes and phonemes as mentioned before.
If pitch lines were turned on they would probably look like this:
upload_2020-8-28_13-15-34.png
(ye I know I had to improv in paint as;lfjas;ljf;j)


-2.d: Themes-

There should be a few themes built into the editor that you can pick from, as well as a theme editor.



Change 3: Multiple tracks
yes have multiple tracks to work with
not too much to explain here, it is what it is



Welp I can't think of anything else significant here, this took me a while
idk what do you think? :P
Please let me know!
 
Last edited:

sangv

Ruko's Ruffians
Defender of Defoko
While the idea of instant rendering is nice, I feel like that would probably lock everyone down to just one resampler with the way you've suggested implementing it. I'm not really sure how else it would be implemented though because I don't really know too much about the technology side of vocalsynth.
I also feel like, to be able to have automatic note creation based on word/phoneme input for CVVC voicebanks, it would require extra configuration. But again, still a nice idea, and could probably work I think!
Also, in the UI mockup you created there's no visible way to change tempo.
 

sof¡¡¡a

Ruko's Ruffians
Defender of Defoko
I agree with what you're proposing. UTAU is way harder to use than other synths, and making it more user-friendly would really be a game-changer.
I can compare your mockup's interface to Synthesizer V (which is an awesome reference for a good synth IMO). It's really pretty; and... I think that changing UTAU's interface would be a good idea. I feel that it's a bit outdated and that it has stayed the same for years and years. A little change would be great.
Something I would LOVE to be able to do is to edit pitch curves properly without needing external plugins. And, this may be an unpopular opinion but... I would like to get rid of that awful lot of resamplers and wavtools sitting on my computer.

Those are the things I can think of. UTAU needs a bit of change; it is a great synth and it can be even better! :D
 

sangv

Ruko's Ruffians
Defender of Defoko
And, this may be an unpopular opinion but... I would like to get rid of that awful lot of resamplers and wavtools sitting on my computer.
I would be okay with this only if the resampler we got forced to use was something flexible with lots of flags and that tends to work well with most voicebanks, like moresampler. If we were stuck with resampler.exe or doppeltler I would probably just stick with older UTAU versions.
 
  • Like
Reactions: bio and sof¡¡¡a

VocAddict

The Voice Within Us
Defender of Defoko
Real time synthesis is probably not possible to do the way resamplers and wavtools are currently coded as output is resampled first completely and then stitched together with the wavtool and not something of a parallel nature that is required for real time synthesis, and that would probably require reprogramming (which probably won't happen considering most devs aren't active any more).

Another way to implement is to probably have an extension of the batch processes for that's currently done for rendering but also applied to wavtools, but I'm not sure how feasible that would be.

The one thing I really want changed in an update is the increase in the oto limit, the removal of rests, and a touch up so we no longer feel like wee are in the age of XP. UTAU is of the nature that the community is the one that allows new features to be added in with the use of plugins and the like. Inbuilt phoneme systems would be nice, but I believe that would be better off as just having a dictionary feature and having the community add whatever reclists they want.

this may be an unpopular opinion but... I would like to get rid of that awful lot of resamplers and wavtools sitting on my computer
Truly an unpopular opinion in my case lol. The ability to choose which resampler and wavtool you want to use is what makes UTAU special. I would hate having to be locked down to one resampler (and if that was supposed to happen, it would be resampler/doppeltler) like UTAU-Synth. Regardless of the improvements made to the software, I would not upgrade because of that fact alone.
 

Axiomatic-Vertex

Retired User
Retired User
Defender of Defoko
Thread starter
Inbuilt phoneme systems would be nice, but I believe that would be better off as just having a dictionary feature and having the community add whatever reclists they want.
That's a good idea! I was thinking that users can make their own reclist dictionaries based on the voicebanks' oto files, maybe?

Real time synthesis is probably not possible to do the way resamplers and wavtools are currently coded as output is resampled first completely and then stitched together with the wavtool and not something of a parallel nature that is required for real time synthesis, and that would probably require reprogramming (which probably won't happen considering most devs aren't active any more).
I did mention that this would be a new resampler that was remade to do this sort of thing, not the resampler we already use.

The one thing I really want changed in an update is the increase in the oto limit, the removal of rests, and a touch up so we no longer feel like wee are in the age of XP.
Hell yeah, rest notes suck like hell
The interface does indeed look like something straight out of Windows 98. It still works well, though

The main issue also is that the framework that UTAU uses is an old version of Visual Basic, which apparently will go out of support in newer versions of Windows. This is pretty problematic

The ability to choose which resampler and wavtool you want to use is what makes UTAU special. I would hate having to be locked down to one resampler (and if that was supposed to happen, it would be resampler/doppeltler) like UTAU-Synth. Regardless of the improvements made to the software, I would not upgrade because of that fact alone.
Yeah...it's kinda tricky now - you'd want instant synthesis, but at the same time you won't be able to use different resamplers. I like Moresampler and I pretty much use it all the time
It would have to be one or the other I guess
 
  • Like
Reactions: sangv

Lyouuv

Ruko's Ruffians
Defender of Defoko
Ok so number one:
NO REST NOTES
ok that's it

Secondly, the new UI should support CV, VCV, CVVC/VCCV and Arpasing/Cz lyrics and display the phonetics for the notes accordingly, like these mockups:
I think this is a good idea! Rests are just super annoying to work with imo and I agree that it would make things a lot easier it we didn't have to deal with so many of them. If rests were removed think of how much easier inserting breaths would be!!! Plus its really easy to accidentally either delete a small rest, insert one or change it's length and it throws the entire ust timing off :cry:
btw I love love love your note mock up <3

Something I would LOVE to be able to do is to edit pitch curves properly without needing external plugins. And, this may be an unpopular opinion but...
Omigosh yes THIS!! And maybe this is too much to ask, but I think if Ameya were to take note of some popular plugins like extended envelope edit and pitch trace, and implement those features into the new UTAU would be a good idea? Sometimes plugins are finicky too so hopefully having these new features built into the actual program would make this problem less/not an issue anymore.

And one more thing... plsplspls I wanna be able to edit keybind commands!!! Like adding a keybind to add a control point on a pitch curve, or a keybind to instantly bring up the envelope window. This would be so wonderful for all the lazy tuners like me :love:
 

VocAddict

The Voice Within Us
Defender of Defoko
The main issue also is that the framework that UTAU uses is an old version of Visual Basic, which apparently will go out of support in newer versions of Windows.
I'm tired of people spreading false information about this. VB6 is not supported by Microsoft any more but due to it's use in a large range of Enterprise software, Microsoft has no intention of discontinuing it's ability to be used on Windows. It's legacy code but it will continue to work for the unforeseeable future as Microsoft continues to ship the binaries with every release of Windows. As Windows 10 is their last Windows so unless they say otherwise, VB6 will continue to shipped for programs that require it.
 

Axiomatic-Vertex

Retired User
Retired User
Defender of Defoko
Thread starter
I'm tired of people spreading false information about this. VB6 is not supported by Microsoft any more but due to it's use in a large range of Enterprise software, Microsoft has no intention of discontinuing it's ability to be used on Windows. It's legacy code but it will continue to work for the unforeseeable future as Microsoft continues to ship the binaries with every release of Windows. As Windows 10 is their last Windows so unless they say otherwise, VB6 will continue to shipped for programs that require it.
Really? I had no idea, I kept hearing that it's going to be unsupported soon and that UTAU won't be able to be used anymore as a result, I was completely wrong!
But I still agree with the fact that UTAU should be using a newer or even different framework, simply because while VB6 is legacy and it continues to work, it's still very outdated.
 

Lyouuv

Ruko's Ruffians
Defender of Defoko
How can we reach AMEYA/AYAME? It would be great if she could listen to these great ideas and maybe consider them
I think this would be a good idea too but I wonder if Ameya already gets a lot of this from the Japanese speaking side of the community (but I could be wrong?? I just imagine getting frequent email/Twitter DMs of people asking for an UTAU update.)

But they were working on phavoco and doppeltler just this year, so I'm still secretly hopefull an update is slowly being worked on, or is something in the back of their head atleast.
 

NordGeit

Your stubborn Yotsuba Channel frequenter. Direct.
Supporter
Defender of Defoko
Wait... What's this about a next generation and next major version?

I know UTAU's not dead-end software exactly, but the last update I've seen is from 2014, with Ameya working on some resampling...?

Hey could I get a fill-in plz

Nevertheless- I'll take norwegian lettering support (æøå) for a norwegian voicebank yeskplzthx
 

vlbonnie

Ritsu's Renegades
Defender of Defoko
Wait... What's this about a next generation and next major version?

I know UTAU's not dead-end software exactly, but the last update I've seen is from 2014, with Ameya working on some resampling...?

Hey could I get a fill-in plz

Nevertheless- I'll take norwegian lettering support (æøå) for a norwegian voicebank yeskplzthx
Earlier this year (I think? I'm awful at keeping track of time) Ameya/Ayame said they'd be trying to put out a new UTAU update, but it would come in pieces. Doppeltler, f2resamp, and wavtool2 were some of those pieces, just as replacements for the regular resampler and wavtool.

Also, TIL UTAU doesn't support Norwegian lettering. Oof.
 

nonino44441

Momo's Minion
Hey, my question may be stupid anyway but... why is anybody doing it ? Like i sure some people on this forum must have some skills in programming so why is anybody's working on a prototype (at least somethn small like a new UI) and send it to Ameya to make em do somethn about it ?
 

Kiyoteru

UtaForum power user
Supporter
Defender of Defoko
  • Like
Reactions: nonino44441

Oxygen Dioxide

Momo's Minion
In my opinion, the biggest limitation of utau is its engine api, which only supports "single note" synthesis. Which means, the engine can only synthesis one note one time, and it can't access to the other notes by design. This prevents the implemention of AI engines on utau, because AI engines train and synthesis by sentence, and it have to know a whole sentence before it can synthesis. As a result, neutrino doesn't support utau. It only supports inputing musicxmls. If a new frontend is developed just as "a better utau", it will never be the successor of utau.

Actually, the utau api (by design) can't meet the need of moresampler, because it have to do some "post-processing " (transfer llsm back to audio) after finishing a song. So it uses a "hacking" method: opening the script to see if this is the last note.

The successor of utau should be open-source, compatible with some existing voicebanks in its own way (like deepvocal which supports cvvc), and have an extensible engine api.

Here is my passage about this issue (in chinese): https://www.bilibili.com/read/cv10500223

There is a new frondend called "Infinity", in development. Here is a video: https://www.bilibili.com/video/BV16i4y1u7iF
It will probably be "the successor of utau" in my opinion after its release.
 

Kazumimi

Ritsu's Renegades
Defender of Defoko
Hmm... ultimately there are few features I really care about having implemented in UTAU. I'm going to be unpopular and say I actually like the UI and can't say I'm fond of the UI that other "UTAU" software use. What I want to see in UTAU isn't a UI overhaul, but rather some quality of life things; i.e. an envelope editing method more like that of UTAU-Synth, as well as paramaters for adjusting the flags. Also, real-time/instant rendering would be nice, but I don't want to be locked into using only one resampler. AI synthesis would be cool--I'd love to hear what a Tei or Teto AI would sound like, hehe.
Also, Vocaloid/SynthV style English (as in, you enter the word(s) and it puts the phonemes in for you) would be nice.
 
  • Like
Reactions: Utaeru

Similar threads