The next generation of UTAU - changes we'd like to see

Axiomatic-Vertex · Aug 27, 2020

Hi!

I'm here to discuss some features and changes that I, and you, would like to see in the next major version of UTAU. I am aware that this subject has been discussed before, and I would like to bring it up again.

So let's begin.

Change 1: instant rendering
yes obviously this would be fantastic

I think this can be achieved in the same way that you make voicebanks in DeepVocal - that the voicebanks should be built or compiled into one package, in a similar way that you'd compile a program into a ready-to-use .exe file. This would allow for instant and responsive playback, as the rendered data has already been made.

I'm not sure how you would go about testing the voicebank for errors or configuring it in the same editor - the only way would be to keep the voicebank's project and change that accordingly, then compile it again and put it into UTAU to test with. If you know what I mean.

One problem that could arise from this, however, is that you'll be limited to how many resamplers you can use. If a system like this does get implemented hypothetically, you'll only get to use the new resampler that's included with UTAU as it would be designed for this new system. You could only hope that the resampler does a good job at resampling.
Heck, Moresampler could be implemented somehow! (that would be good tho)

Change 2: UI
yeah a big UI change would be great

I spent almost 2 hours on this new UI mockup:

It's not terribly great, but there's a few improvements over the UI we already have!

I've made the UI mockup in MockFlow. You can register for free - however you'll only be able to make one UI mockup.

-2.a: Notes-

Ok so number one:
NO REST NOTES
ok that's it

Secondly, the new UI should support CV, VCV, CVVC/VCCV and Arpasing/Cz lyrics and display the phonetics for the notes accordingly, like these mockups:

VCV

CVVC

Arpasing

Cz English

You should also have the ability to toggle this phoneme view on or off, and additionally for VCV you can toggle rest notes on or off as well.

-2.b: Toolbars-
fairly simpler than the toolbars we already have:

-2.c: Envelopes and pitch bends-
Envelopes will be sitting on the bottom of the screen as if they're parameters in other software like Synth V:

There's also checkboxes for turning on or off pitch bends, VCV rest notes and phonemes as mentioned before.
If pitch lines were turned on they would probably look like this:

(ye I know I had to improv in paint as;lfjas;ljf;j)

-2.d: Themes-
There should be a few themes built into the editor that you can pick from, as well as a theme editor.

Change 3: Multiple tracks
yes have multiple tracks to work with
not too much to explain here, it is what it is

Welp I can't think of anything else significant here, this took me a while
idk what do you think? :P
Please let me know!

sangv · Aug 27, 2020

While the idea of instant rendering is nice, I feel like that would probably lock everyone down to just one resampler with the way you've suggested implementing it. I'm not really sure how else it would be implemented though because I don't really know too much about the technology side of vocalsynth.
I also feel like, to be able to have automatic note creation based on word/phoneme input for CVVC voicebanks, it would require extra configuration. But again, still a nice idea, and could probably work I think!
Also, in the UI mockup you created there's no visible way to change tempo.

Axiomatic-Vertex · Aug 27, 2020

tempo is here, yeah it's kinda hard to see:

also I didn't think about the resampler situation either, that's a really good point.

sof¡¡¡a · Aug 27, 2020

I agree with what you're proposing. UTAU is way harder to use than other synths, and making it more user-friendly would really be a game-changer.
I can compare your mockup's interface to Synthesizer V (which is an awesome reference for a good synth IMO). It's really pretty; and... I think that changing UTAU's interface would be a good idea. I feel that it's a bit outdated and that it has stayed the same for years and years. A little change would be great.
Something I would LOVE to be able to do is to edit pitch curves properly without needing external plugins. And, this may be an unpopular opinion but... I would like to get rid of that awful lot of resamplers and wavtools sitting on my computer.

Those are the things I can think of. UTAU needs a bit of change; it is a great synth and it can be even better! :D

sangv · Aug 27, 2020

AnnYann said:
And, this may be an unpopular opinion but... I would like to get rid of that awful lot of resamplers and wavtools sitting on my computer.

I would be okay with this only if the resampler we got forced to use was something flexible with lots of flags and that tends to work well with most voicebanks, like moresampler. If we were stuck with resampler.exe or doppeltler I would probably just stick with older UTAU versions.

VocAddict · Aug 27, 2020

Real time synthesis is probably not possible to do the way resamplers and wavtools are currently coded as output is resampled first completely and then stitched together with the wavtool and not something of a parallel nature that is required for real time synthesis, and that would probably require reprogramming (which probably won't happen considering most devs aren't active any more).

Another way to implement is to probably have an extension of the batch processes for that's currently done for rendering but also applied to wavtools, but I'm not sure how feasible that would be.

The one thing I really want changed in an update is the increase in the oto limit, the removal of rests, and a touch up so we no longer feel like wee are in the age of XP. UTAU is of the nature that the community is the one that allows new features to be added in with the use of plugins and the like. Inbuilt phoneme systems would be nice, but I believe that would be better off as just having a dictionary feature and having the community add whatever reclists they want.

AnnYann said:
this may be an unpopular opinion but... I would like to get rid of that awful lot of resamplers and wavtools sitting on my computer

Truly an unpopular opinion in my case lol. The ability to choose which resampler and wavtool you want to use is what makes UTAU special. I would hate having to be locked down to one resampler (and if that was supposed to happen, it would be resampler/doppeltler) like UTAU-Synth. Regardless of the improvements made to the software, I would not upgrade because of that fact alone.

Axiomatic-Vertex · Aug 27, 2020

VocAddict said:
Inbuilt phoneme systems would be nice, but I believe that would be better off as just having a dictionary feature and having the community add whatever reclists they want.

That's a good idea! I was thinking that users can make their own reclist dictionaries based on the voicebanks' oto files, maybe?

VocAddict said:
Real time synthesis is probably not possible to do the way resamplers and wavtools are currently coded as output is resampled first completely and then stitched together with the wavtool and not something of a parallel nature that is required for real time synthesis, and that would probably require reprogramming (which probably won't happen considering most devs aren't active any more).

I did mention that this would be a new resampler that was remade to do this sort of thing, not the resampler we already use.

VocAddict said:
The one thing I really want changed in an update is the increase in the oto limit, the removal of rests, and a touch up so we no longer feel like wee are in the age of XP.

Hell yeah, rest notes suck like hell
The interface does indeed look like something straight out of Windows 98. It still works well, though

The main issue also is that the framework that UTAU uses is an old version of Visual Basic, which apparently will go out of support in newer versions of Windows. This is pretty problematic

VocAddict said:
The ability to choose which resampler and wavtool you want to use is what makes UTAU special. I would hate having to be locked down to one resampler (and if that was supposed to happen, it would be resampler/doppeltler) like UTAU-Synth. Regardless of the improvements made to the software, I would not upgrade because of that fact alone.

Yeah...it's kinda tricky now - you'd want instant synthesis, but at the same time you won't be able to use different resamplers. I like Moresampler and I pretty much use it all the time
It would have to be one or the other I guess

Lyouuv · Aug 28, 2020

Axiomatic-Vertex said:
Ok so number one:
NO REST NOTES
ok that's it

Secondly, the new UI should support CV, VCV, CVVC/VCCV and Arpasing/Cz lyrics and display the phonetics for the notes accordingly, like these mockups:

I think this is a good idea! Rests are just super annoying to work with imo and I agree that it would make things a lot easier it we didn't have to deal with so many of them. If rests were removed think of how much easier inserting breaths would be!!! Plus its really easy to accidentally either delete a small rest, insert one or change it's length and it throws the entire ust timing off :cry:

btw I love love love your note mock up <3

AnnYann said:
Something I would LOVE to be able to do is to edit pitch curves properly without needing external plugins. And, this may be an unpopular opinion but...

Omigosh yes THIS!! And maybe this is too much to ask, but I think if Ameya were to take note of some popular plugins like extended envelope edit and pitch trace, and implement those features into the new UTAU would be a good idea? Sometimes plugins are finicky too so hopefully having these new features built into the actual program would make this problem less/not an issue anymore.

And one more thing... plsplspls I wanna be able to edit keybind commands!!! Like adding a keybind to add a control point on a pitch curve, or a keybind to instantly bring up the envelope window. This would be so wonderful for all the lazy tuners like me :love:

VocAddict · Aug 28, 2020

Axiomatic-Vertex said:
The main issue also is that the framework that UTAU uses is an old version of Visual Basic, which apparently will go out of support in newer versions of Windows.

I'm tired of people spreading false information about this. VB6 is not supported by Microsoft any more but due to it's use in a large range of Enterprise software, Microsoft has no intention of discontinuing it's ability to be used on Windows. It's legacy code but it will continue to work for the unforeseeable future as Microsoft continues to ship the binaries with every release of Windows. As Windows 10 is their last Windows so unless they say otherwise, VB6 will continue to shipped for programs that require it.

Axiomatic-Vertex · Aug 28, 2020

VocAddict said:
I'm tired of people spreading false information about this. VB6 is not supported by Microsoft any more but due to it's use in a large range of Enterprise software, Microsoft has no intention of discontinuing it's ability to be used on Windows. It's legacy code but it will continue to work for the unforeseeable future as Microsoft continues to ship the binaries with every release of Windows. As Windows 10 is their last Windows so unless they say otherwise, VB6 will continue to shipped for programs that require it.

Really? I had no idea, I kept hearing that it's going to be unsupported soon and that UTAU won't be able to be used anymore as a result, I was completely wrong!
But I still agree with the fact that UTAU should be using a newer or even different framework, simply because while VB6 is legacy and it continues to work, it's still very outdated.

sof¡¡¡a · Aug 29, 2020

How can we reach AMEYA/AYAME? It would be great if she could listen to these great ideas and maybe consider them

Axiomatic-Vertex · Aug 29, 2020

I did this a few days ago:
https://twitter.com/AxiomaticVertex/status/1299186660261806080

sof¡¡¡a · Aug 29, 2020

Axiomatic-Vertex said:
I did this a few days ago:
https://twitter.com/AxiomaticVertex/status/1299186660261806080

That's great!

Lyouuv · Aug 30, 2020

AnnYann said:
How can we reach AMEYA/AYAME? It would be great if she could listen to these great ideas and maybe consider them

I think this would be a good idea too but I wonder if Ameya already gets a lot of this from the Japanese speaking side of the community (but I could be wrong?? I just imagine getting frequent email/Twitter DMs of people asking for an UTAU update.)

But they were working on phavoco and doppeltler just this year, so I'm still secretly hopefull an update is slowly being worked on, or is something in the back of their head atleast.

NordGeit · Oct 17, 2020

Wait... What's this about a next generation and next major version?

I know UTAU's not dead-end software exactly, but the last update I've seen is from 2014, with Ameya working on some resampling...?

Hey could I get a fill-in plz

Nevertheless- I'll take norwegian lettering support (æøå) for a norwegian voicebank yeskplzthx

vlbonnie · Oct 17, 2020

NordGeit said:
Wait... What's this about a next generation and next major version?

I know UTAU's not dead-end software exactly, but the last update I've seen is from 2014, with Ameya working on some resampling...?

Hey could I get a fill-in plz

Nevertheless- I'll take norwegian lettering support (æøå) for a norwegian voicebank yeskplzthx

Earlier this year (I think? I'm awful at keeping track of time) Ameya/Ayame said they'd be trying to put out a new UTAU update, but it would come in pieces. Doppeltler, f2resamp, and wavtool2 were some of those pieces, just as replacements for the regular resampler and wavtool.

Also, TIL UTAU doesn't support Norwegian lettering. Oof.

nonino44441 · Mar 6, 2021

Hey, my question may be stupid anyway but... why is anybody doing it ? Like i sure some people on this forum must have some skills in programming so why is anybody's working on a prototype (at least somethn small like a new UI) and send it to Ameya to make em do somethn about it ?

Kiyoteru · Mar 6, 2021

nonino44441 said:
Hey, my question may be stupid anyway but... why is anybody doing it ? Like i sure some people on this forum must have some skills in programming so why is anybody's working on a prototype (at least somethn small like a new UI) and send it to Ameya to make em do somethn about it ?

Please check this thread: https://utaforum.net/threads/utsu-a-cross-platform-vocal-synth-frontend.17476/

Oxygen Dioxide · Jun 1, 2021

In my opinion, the biggest limitation of utau is its engine api, which only supports "single note" synthesis. Which means, the engine can only synthesis one note one time, and it can't access to the other notes by design. This prevents the implemention of AI engines on utau, because AI engines train and synthesis by sentence, and it have to know a whole sentence before it can synthesis. As a result, neutrino doesn't support utau. It only supports inputing musicxmls. If a new frontend is developed just as "a better utau", it will never be the successor of utau.

Actually, the utau api (by design) can't meet the need of moresampler, because it have to do some "post-processing " (transfer llsm back to audio) after finishing a song. So it uses a "hacking" method: opening the script to see if this is the last note.

The successor of utau should be open-source, compatible with some existing voicebanks in its own way (like deepvocal which supports cvvc), and have an extensible engine api.

Here is my passage about this issue (in chinese): https://www.bilibili.com/read/cv10500223

There is a new frondend called "Infinity", in development. Here is a video: https://www.bilibili.com/video/BV16i4y1u7iF
It will probably be "the successor of utau" in my opinion after its release.

Kazumimi · Jun 19, 2021

Hmm... ultimately there are few features I really care about having implemented in UTAU. I'm going to be unpopular and say I actually like the UI and can't say I'm fond of the UI that other "UTAU" software use. What I want to see in UTAU isn't a UI overhaul, but rather some quality of life things; i.e. an envelope editing method more like that of UTAU-Synth, as well as paramaters for adjusting the flags. Also, real-time/instant rendering would be nice, but I don't want to be locked into using only one resampler. AI synthesis would be cool--I'd love to hear what a Tei or Teto AI would sound like, hehe.
Also, Vocaloid/SynthV style English (as in, you enter the word(s) and it puts the phonemes in for you) would be nice.

Thread starter	Title	Forum	Replies	Date
	Is there a way to auto move a oto cutoff to the next lines offset?	UtaHelp	2	Feb 22, 2025
	What vocal-synth should I use for my next original song?	UTAU Discussion	17	May 1, 2023
	Voicebank Next [Deleted]	UTAU Showcase	1	Feb 14, 2022
V	Audio was working one minute, didn't the next	UtaHelp	0	Nov 27, 2021

The next generation of UTAU - changes we'd like to see

Retired User

Ruko's Ruffians

Retired User

Ruko's Ruffians

Ruko's Ruffians

The Voice Within Us

Retired User

Ruko's Ruffians

The Voice Within Us

Retired User

Ruko's Ruffians

Retired User

Ruko's Ruffians

Ruko's Ruffians

Your stubborn Yotsuba Channel frequenter. Direct.

Ritsu's Renegades

Momo's Minion

UtaForum power user

Momo's Minion

Ritsu's Renegades

Similar threads