Thoughts on Ameya's possible UTAU update tweets?

LilyoftheValley

Your local flower girl
Defender of Defoko
I haven't seen anyone make a thread on this yet, if there is one throw it down below.

Our lord and savior has returned with tweets showing an update/translation to UTAU. It seems to be just a German translation, but it has new features? Anyone understand these tweets better? What is actually being done? Is it an actual update? Thoughts? Speculations?
Discuss and stuff

https://twitter.com/ameyaP_

DvGsuokUUAQMuu0.jpg
Du8NcXRU0AEocuu.jpg
Du2h0MXVsAAXzx9.jpg:large
DvBIN7cU0AEGJ_A.jpg
 

Avalia-Kasa

probably a potato tbh
Supporter
Defender of Defoko
i'm so excited for this update you have no idea!!! im VERY sure its an update coming soon

things i think will be features:
  • shareware UTAU (probably not freeware but i might be wrong) will have the folder alias - from what i can tell, if u put a marker of the folder alias in (i'm guessing # is the trigger) will change the subfolder the voicebank is using for easier multiexpression
  • POSSIBILITY of end consonant oto configuration being added which is!! SO AMAZING!! FOR NON-CV LANGUAGES!!!! AAA?? (though this isn't explicit and we can only speculate based off the cropping of the pictures)
  • you can have subfolders inside of subfolders now so instead of having banks with (Soft-A3), (A3), and (Power-A3) you can have (Default) (Soft) and (Power) folders with the pitches inside those!!
  • judging by the sheer volume of features from this update i'm willing to bet there will be either a change to the oto limit or subfolders will be a way of getting around it... but dont quote me on this one
im so hyped!! you have no idea how excited i am aaaaaAAAAA??
 

Kiyoteru

UtaForum power user
Supporter
Defender of Defoko
shareware UTAU (probably not freeware but i might be wrong) will have the
Since all of the features in freeware still exist in shareware, we don't actually know for sure which version the update will be for. The screenshots feature the shareware version, yes, but it would probably be concerning if a software developer didn't have full access to every feature they were still working on.
 

Avalia-Kasa

probably a potato tbh
Supporter
Defender of Defoko
Since all of the features in freeware still exist in shareware, we don't actually know for sure which version the update will be for. The screenshots feature the shareware version, yes, but it would probably be concerning if a software developer didn't have full access to every feature they were still working on.
i'm basing my assumption on this screenshot
DvBIN7cU0AEGJ_A.jpg

while it may still be freeware that will be able to use markers for changing the expression, i think the use of auto vcv is an interesting choice of screenshot and... tbh? there's not much reason to get shareware utau since vcv plugins exist already, so it might be a good incentive to get it if ameya does inded decide to put it on shareware only o:
after all, freeware still has the suffix broker
 

Avalia-Kasa

probably a potato tbh
Supporter
Defender of Defoko
ok rollback? i guess?

thing i posted last time: strength feature for how much the dynamics change with vibrato (which i'm hoping will also mean more dynamic support than just envelopes)
Dva0IrLVAAANLop


new thing?? oooo crossfade optimization??? tell me more ameya >.>
DvkxfwVU0AI9rWm
 

Kiyoteru

UtaForum power user
Supporter
Defender of Defoko
I'm very intrigued by this feature and I'm hoping that there's other people I can discuss this in-depth with. If you don't mind, I'll be copying some of the thoughts I've already shared elsewhere, with some edits to clarify the thoughts and make them more legible.




Why not yellow though?
(friend) I love that mint green for second consonant.
I disagree, it should be yellow. Every parameter should have its own color. Overlap and endcons could get confused easily.

(friend) maybe it's good for Chinese?
Complete chinese syllables in vocalsynth (cvvchinese, vocaloid phonemes, etc) end up with really cyva-ish pronunciation (ie. overpronounced)

Honestly though, I'm not sure how this fits into our current knowledge of OTO and reclist theory.
That just takes away the biggest advantage of CVVC, which is that you're able to mix and match CVs and VCs.
A voicebank with:
ka ak ta at pa ap (CV and VC samples)
VS:
kak kat kap tak tat tap pak pat pap (complete CVC samples)
There's a big difference in the efficiency of the approach

With this you need a complete CVC syllable, unless people appropriate it for VC otos. I'm just not sure how yet.
If you were to control the timing of a diphthong ending in a consonant, and you can use endcons to ensure that the consonant is unstretched while affecting the length of the vowel portion
but in envelope terms that implies something like "increase preutterance and lower STP"???
the preutterance parameter is the one that determines the difference between the actual start of the audio for that note, and the notated beginning in the score

DweINvVUcAADldX.jpg:large


Here's my theory for the envelope. I'm curious about the danger zone, would this have to be considered as an intentional feature? I mean, it wouldn't be the overlap if it the notes didn't overlap at that point, but in this case you'd be losing some of the consonant that you're trying to preserve with endcons.

One way that this could be used would be like a reverse VCV, where instead of using vowels as the blending point, it's consonants. Like a CVVC japanese voicebank except grouped as CVCs. Sure, it's inefficient, but VCV is inefficient too
ex. consonants are k t p g d b and vowels are a i u. Therefore a reclist with every CVC would look like
kakatakapakag tatapatagatad papagapadapab gagadagabagak dadabadakadat babakabatakap kikitikipikig titipitigitid pipigipidipib gigidigibigik didibidikidit bibikibitikip kukutukupukug tutuputugutud pupugupudupub gugudugubuguk dudubudukudut bubukubutukup
To OTO "kakatakapakag" you would split it into these units: [kak][kat][tak][kap][pak][kag]
Avcv list using the same phonemes (k t p g d b / a i u) would look like this:
kakakika kikikuki kukukaku tatatita titituti tututatu papapipa pipipupi pupupapu gagagiga gigigugi gugugagu dadadida dididudi dududadu bababiba bibibubi bububabu
And then of course "kakakika" splits into [- ka][a ka][a ki][i ka]

Just in terms of Japanese, imagine a voicebank where the input is like this: [- か][a えr][るn][の][o うt][たg][が-]
Sure, you could just use VCV, but then you'd need a separate [a が] and [a -] sample. So this particular approach to CVC samples is to maximize the context.

Of course, all of this is completely pointless if the OTO limit hasn't gone up from 2^15 lines.
 
Last edited:

Sylveranty

Ritsu's Renegades
Defender of Defoko
This is quite an interesting feature to be incorporated, but I also wondered how it should be properly used.

From how I understand it currently, the pink field basically tells the resamplers to not strecht/loop this part, the white part gets stretched/looped, and the new green part would then be incorporated into the note when the note is nearing its end without being stretched or looped.
And as Kiyoteru has said, that would mean you'd need a complete CVC syllable to use this feature. Every recording would be kinda closed in itself, starting and ending in what the recording comes with. If it could be specified that "this is the ending [ot]" and this ending [ot] would always be played when the resampler reads it in any combination of text in the note, e.g. "rot" automatically taking [ro] and [ot] out of their recordings, that would probably be pretty neat.

If however this indeed means you need to have every possible C(CC)V with every possible VC(CC) of the language, that'd be a monster of a voicebank.
German has up to 15 single vowels, not counting diphthongs and counting schwa and a-schwa as their own phonems.
There are around 24 consonants that can stand in the starting position of a syllable, and around 13 that stand in the coda, consonant clusters aren't even considered yet. You easily end up with at least 4000 lines or recordings, even if you scratch the a-schwa. And again, that is without consonant clusters. Adding CCV to that, the number of recording at least doubles. This still excludes ending clusters, though I'd put those together out of [VC]+[CC] either way. Having 8000 to 10.000 recordings for one pitch is quite overwhelming and I wouldn't be a fan of it.

My numbers can of course be totally wrong, I'm never sure with the logic I apply to maths. I'd love to hear more about this feature and how it works in detail, so that the community can start to figure out how it can be applied or how we have to change our thinking to cleverly use it.
 

Soursop the fruit

✧ Fruity & Happy ✧
Defender of Defoko
This feature can be used as extra in english bank to quickly create frequently used CVC (or one syllable word) words and reduce copy-pasting notes/plugin usage/making the word from zero. User can create note, set length and insert the word(ex:[through] [heart] [will])

I'm curious what's the "Endkonsonant" button for though, i hope Ameya will tell us about it.
 
Last edited:

wobinbug

Ritsu's Renegades
Defender of Defoko
Sorry this ended up so long, I just wanted to give some of my own insight and speculation into the CVC otoing options Ameya's working on as I've always wanted VC otoing and it being CVC makes it all the more interesting. Basically, the tl;dr is that I think it's very cool, but, perhaps, a bit gimmicky and I see more uses for it in the context of VC sounds than CVC sounds.
---
I feel like how useful the CVC otoing will be is going to depend on your intention. For example, if you've created a 2-mora CVVC VB ('がが' 'けけ' etc.) you could probably use the recordings to make ending CVs ('が -' 'け -' etc.) and omit the typical end breaths like 'a -' without having to do anything too crazy with your otoing (this would also work with CV now that I think about it) since you could still easily blend 'け' 'e g' 'が -' together.

On the other hand, if you've created a VCV or higher mora CVVC bank, creating ending consonants like this would be highly inefficient because you'd have to record so many extra sounds (although, you could probably make it work if you wanted to add 'v c' otos to your VCV bank, alternatively 2-mora VCV would work if you really want) so you're probably better off just sticking with the standard 'a -' ending breaths instead. That said, being able to give the ending breaths a proper VC oto may help prevent breaths getting stretched or distorted in UTAU, so it will still be useful.

Of course this is just in the context of Japanese and currently existing methods which, of course, don't account for a feature that was completely non-existent initially, and maybe we'll start seeing things like this
Imagine a voicebank where the input is like this: [- か][a えr][るn][の][o うt][たg][が-]
which I'm actually pretty excited to think about and could provide some quality results, but this is, of course, all speculation, and only time and testing can say where this'll lead.

In the context of languages like English or German, I think the option for CVC otos could be incredibly useful, especially with regards to otoing the ending sounds (stuff like 'est' 'eks') though I'm not sure how a full 'CVC' otoed sound would be utilised, though, granted, I have considerably less experience with non-Japanese vbs. I feel like @Sylveranty pretty much sums up my thoughts for these languages as well regarding CVC otos:
German has up to 15 single vowels, not counting diphthongs and counting schwa and a-schwa as their own phonems.
There are around 24 consonants that can stand in the starting position of a syllable, and around 13 that stand in the coda, consonant clusters aren't even considered yet. You easily end up with at least 4000 lines or recordings, even if you scratch the a-schwa. And again, that is without consonant clusters. Adding CCV to that, the number of recording at least doubles. This still excludes ending clusters, though I'd put those together out of [VC]+[CC] either way. Having 8000 to 10.000 recordings for one pitch is quite overwhelming and I wouldn't be a fan of it.

Regarding languages such as Spanish and Korean, however, I think CVC otos could be pretty revolutionary (being able to oto sounds like 'geul' for example) because they are, to my understanding, a lot less taxing than languages like English or German regarding potential CVC sounds. Of course, this could still run the risk of being inefficient, but singling out the most common CVC sounds and still combining CV with VC for the remaining sounds could result in a pretty nifty voicebank (you could probably make this argument for languages like English or German too, but I could still see that being immensely taxing in comparison).

I also like Soursop's idea
This feature can be used as extra in english bank to quickly create frequently used CVC (or one syllable word) words and reduce copy-pasting notes/plugin usage/making the word from zero. User can create note, set length and insert the word(ex:[through] [heart] [will])
I already see people adding things like numbers to their VBs, and I think creating a reclist of the most common words used in songs to go alongside your standard reclist (whatever your preference) could be a really neat use for CVC otoing and speed up the editing process (inb4 someone records a whole dictionary for the meme value). Of course we'll have to wait to see how well something like this actually works in practice.
---
Edit: Tidied things up with spoilers
 

sangv

Ruko's Ruffians
Defender of Defoko
This feature can be used as extra in english bank to quickly create frequently used CVC (or one syllable word) words and reduce copy-pasting notes/plugin usage/making the word from zero. User can create note, set length and insert the word(ex:[through] [heart] [will])
That's a pretty good idea. Would anyone actually be interested in having a list of commonly used single-syllable words that they can pick and choose from to record if the CVC feature was released? Because if so, I might attempt to compile a list like that myself somehow, by maybe going through the lyrics from some songs in the Billboard top 100 from 2000 to 2018. Kind of sounds a bit daunting, but I guess it'd be worth a try, since I can't really find many good lists of frequently used words in songs out there at the moment.
 

UtaJoule

Teto's Territory
Defender of Defoko
Very late reply but I think the End Consonant function will be VERY useful for making CV-C English voicebanks without the need to crossfade dipthongs like long I, long A, long O, "8' in CZ's phoneme symbols, etc. And you could easily just put the End Consonant where the vowel changes.

So, like, instead of needing the crossfades like [-I] [I-], [-A] [A-], etc, you can simply just go [ I], [A]

"Welcome to Wendy's, may I take your order?" would be...
[we] [l] [ku] [m] [to] [we] [n] [dE] [z] [mA] [tA] [k] [y0r] [0r] [d3] (obviously I'm using CZ phonemes XP)

Instead of...
[-we] [e l] [l k] [ku] [um] [m t] [to] [o w] [we] [en] [n d] [dE] [Ez] [mA] [AI] [I t] [tA] [Ak] [y0] [0r] [_0] [0 r] [r d] [d3] [3-]

And as someone who loves really good CV voicebanks, CV English like this would just make me die from happiness. Plus, it would make English for new users much easier!
 

Similar threads