Questions About AI Voicebanks?

Eternal-Aurath

Momo's Minion
I've recently seen some AI voicebank stuff, and it looks really cool/promising! Aside from the "commercial" or "professional" ones, I've seen a few community user created ones.

I'd really love to experiment with making one myself, but I can't find a good tutorial in English to explain how to do it. I found this tool: https://r9y9.github.io/blog/2020/05/10/nnsvs/ and also this one: https://qiita.com/taroushirani/items/ec16cb9a6b3b691f5e74 but again can't quite figure out how to use them or anything.

I've also heard of an AI plugin for UTAU called ENUNU, but also don't know how that's supposed to work at all.

I was also wondering if these AI voicebanks are Japanese only due to technological limitations? Or is English possible if the work/time is put in?

I'm not versed in this kind of coding, but I really wanna learn how to do it, AI voicebanks are such a cool concept that I'd love to try out. It's a different way of preserving your voice forever and using it creatively lol
 

Halo

Icon by Wanpuccino @ DA
Administrator
Defender of Defoko
It's complicated and I'm not particularly smart, so explaining it in detail/theory is beyond me. I did manage to figure out how to make the resources we have work, though.

^ this resource/tutorial is particularly easy for English speakers to follow to get the basics. It gets you a usable ENUNU bank, and it's easier than you think. The ENUNU gang over on twitter are pretty nice and eager to try and get it to be more accessible, so you can try joining CrazY's discord group for it or following them on twitter as you find them:
Mostly this is all in Japanese, but if you're patient and considerate it's fine even if you dont know much.

I do recommend also trying to get NEUTRINO and the tuning tool (https://github.com/sigprogramming/tyouseisientool) to work if you haven't already, since IMHO its one of the more well documented and self-contained methods of getting a (free) AI singer to output usable files, it did help make what I was trying to achieve clearer which always helps imo.

I also found this pretty clear regarding ENUNU, though it dropped right after I got everything working by accident haha
(link in case this video doesnt work. :/ the media embed plugin might be borked)
It's basically a way of using UTAU as a GUI for NNSVS (I'm sure technically this isn't accurate, but that's how it fits into the workflow of actually using an AI singer). UTAU is more familiar, and by default the sole use of sheet music + command prompt to generate files is a bit user unfriendly. But with ENUNU, you can open a UST like normal and just do some minor edits to get the same result. I haven't found a way to tune using the tuning tool for NEUTRINO in combination with ENUNU, so forewarning that using ENUNU means you have to do pitch editing and tuning in an outside tool like VocalShifter or MELODYNE, but I know if you use a NNSVS database without ENUNU you can use the tuning tool with minimal issue which is mega convenient.

Then... for other languages. I know for a fact someone was making a Polish AI VB using NNSVS. No idea how the hell they were making that come together, though configuration should be simple enough. Pix, I think was the username...? If I find the link I'll edit this.
There are English AI singers commercially available/in progress... I have no doubt open-source users are working on it, but yeah consider it a technological limitation for right now. There'll be a more accessible way sooner or later and focusing on getting the basics down'll probably help you understand why it's a bit harder to get working for a casual user/hobbyist, but if you're interested in the voice preservation aspect you can start recording data now regardless of ability to train/label/use it comprehensively; as long as its on pitch in English (or even better, timed to a MIDI), it should be usable later when you know exactly what you need to do with it.

Edit: I have also been looking into it in my free time. If you DO find any information regarding English methods, I'm interested in hearing it!
Double Edit: Atsuya, who is one of the more active users in promoting actual use and creation of ENUNU VBs, also made a simple tut that has eng subs!
(link)
 
Last edited:

Eternal-Aurath

Momo's Minion
Thread starter
It's complicated and I'm not particularly smart, so explaining it in detail/theory is beyond me. I did manage to figure out how to make the resources we have work, though.

^ this resource/tutorial is particularly easy for English speakers to follow to get the basics. It gets you a usable ENUNU bank, and it's easier than you think. The ENUNU gang over on twitter are pretty nice and eager to try and get it to be more accessible, so you can try joining CrazY's discord group for it or following them on twitter as you find them:
Mostly this is all in Japanese, but if you're patient and considerate it's fine even if you dont know much.

I do recommend also trying to get NEUTRINO and the tuning tool (https://github.com/sigprogramming/tyouseisientool) to work if you haven't already, since IMHO its one of the more well documented and self-contained methods of getting a (free) AI singer to output usable files, it did help make what I was trying to achieve clearer which always helps imo.

I also found this pretty clear regarding ENUNU, though it dropped right after I got everything working by accident haha
(link in case this video doesnt work. :/ the media embed plugin might be borked)
It's basically a way of using UTAU as a GUI for NNSVS (I'm sure technically this isn't accurate, but that's how it fits into the workflow of actually using an AI singer). UTAU is more familiar, and by default the sole use of sheet music + command prompt to generate files is a bit user unfriendly. But with ENUNU, you can open a UST like normal and just do some minor edits to get the same result. I haven't found a way to tune using the tuning tool for NEUTRINO in combination with ENUNU, so forewarning that using ENUNU means you have to do pitch editing and tuning in an outside tool like VocalShifter or MELODYNE, but I know if you use a NNSVS database without ENUNU you can use the tuning tool with minimal issue which is mega convenient.

Then... for other languages. I know for a fact someone was making a Polish AI VB using NNSVS. No idea how the hell they were making that come together, though configuration should be simple enough. Pix, I think was the username...? If I find the link I'll edit this.
There are English AI singers commercially available/in progress... I have no doubt open-source users are working on it, but yeah consider it a technological limitation for right now. There'll be a more accessible way sooner or later and focusing on getting the basics down'll probably help you understand why it's a bit harder to get working for a casual user/hobbyist, but if you're interested in the voice preservation aspect you can start recording data now regardless of ability to train/label/use it comprehensively; as long as its on pitch in English (or even better, timed to a MIDI), it should be usable later when you know exactly what you need to do with it.

Edit: I have also been looking into it in my free time. If you DO find any information regarding English methods, I'm interested in hearing it!
Double Edit: Atsuya, who is one of the more active users in promoting actual use and creation of ENUNU VBs, also made a simple tut that has eng subs!
(link)
Yoooo thank you, this is all super helpful!! I'll definitely take a look at all that and see what I can't get up and running!
 

Thehyami

Ruko's Ruffians
Defender of Defoko
I am curious about this too! I've looked at Neutrino and Enunu, and I think what makes it hard for other languages is the complex syllables. Japanese syllables are very simple. So there aren't many things to do.

I don't know if I'm missing something or not, but I wish Enunu or Neutrino can take the envelop parameters from the Ust, So I don't have to make separate recordings for diphthongs :')
 

Thehyami

Ruko's Ruffians
Defender of Defoko
I just checked haru0l's tutorial, the Enunu Training Kit only supports Japanese :sad:
 

Eternal-Aurath

Momo's Minion
Thread starter
@Halo I hate to ping you in this after so long, but I finally started trying to learn how to make an ENUNU bank, and I followed that first tutorial you link up until the part where I have to execute a "Git Bash" command to train the VB. I can't seem to figure out how to do this, can you point me in the right direction? I tried copy/pasting the command from the tutorial into the command dialogue, but it doesn't do anything.
 

Halo

Icon by Wanpuccino @ DA
Administrator
Defender of Defoko
@Halo I hate to ping you in this after so long, but I finally started trying to learn how to make an ENUNU bank, and I followed that first tutorial you link up until the part where I have to execute a "Git Bash" command to train the VB. I can't seem to figure out how to do this, can you point me in the right direction? I tried copy/pasting the command from the tutorial into the command dialogue, but it doesn't do anything.
No problem at all, I live here.

As far as I understand, I didn't have to do that; pretty sure it's because I opted to train the VB via google colab instead of on my own PC.
The recipe link is in the doc; https://colab.research.google.com/d...CldudtMpZRa?usp=sharing#scrollTo=OoymX9tGoZNR

And you follow the numbered instructions there instead of doing the other stuff listed in the doc. If you know a little bit about file structure, it's fairly self explanatory, but if you need help with it just ping me and I'll try to explain in greater detail.
 

Similar threads