How to Create Your First Voicebank: Part 2

This resource is intended as a walkthrough for creating your first UTAU voicebank. It is aimed at absolute beginners and does not assume any prior knowledge of UTAU. This is a crosspost from my website; that version will be the most up-to-date if any changes to this are made.

The text was also slightly too long for a single post, so I've split it into two parts. See here for part one.

Using Your Voicebank
UTAU Project Files

For this demonstration, I've prepared two premade USTs / USTXs you can try out your voicebank with; they do not have any quirks that you might find with other premade USTs which would make them difficult for testing. You can download them here if you haven't already.

UST (UTAU sequence text) is the filetype used by classic UTAU; USTX is the equivalent used by OpenUTAU. OpenUTAU can also import USTs, however, and both softwares can import MIDIs.

Inside the provided folder, you'll find several files, notably two .ust files and two .ustx files, one for each song.

Kaeru no Gasshou (Frog Chorus) is a short children's song. This UST is untuned and good to use for a quick test of the audio or the oto.

Sakura Sakura (Cherry Blossoms) is a folk song that's about 30 seconds long. This UST is tuned, and good for doing a more polished test render.

If your voicebank isn't singing in the provided USTs, or if the audio has a harshness to it, and you didn't already test it earlier, see here for testing and troubleshooting.

I've included the lyrics to both songs plus MIDI files if you want to practice making your own USTs (I will cover this in-depth in a future tutorial). I've also included some example renders using the demo voicebank to give you an idea of what the songs should sound like, as well as the difference in sound between a CV voicebank and a CVVC voicebank.

I've also made some test renders using other premade USTs to give you an idea of how different tuning styles can effect how the voicebank sounds. The CV demo is using a UST made by ZettaSloooow for 'Tiger Rampage' by sasakure.UK, and the CVVC demo is using a UST made by Anemone for 'No Logic' by JimmyThumb-P.

All four of these demos were rendered in OpenUTAU with all expressions default and using moresampler as the resampler. The CV renders used the default phonemizer and the CVVC renders used the JA VCV & CVVC phonemizer.

Working In UTAU
Inside of UTAU, navigate to File(F) > Open(O) and find the UST file. Double click on it, or select it and hit Open. If you don't have the voicebank in the UST's data in your voice folder, it will open up the Project Configurations menu automatically. Like before, find the name of your voicebank in the dropdown menu. If this window doesn't pop up, you can go to Project(P) > Project Properties(R) to open it manually, or click on the voicebank name in the top left corner.

This window is also where you can set the flags, resampler, and wavtool, as were covered in Intro to UTAU. Leaving these as default is fine for testing, though you'll likely want to try out different combinations and find which one you feel gives you the best sound (which often won't be the default resampler).

If your voicebank is recorded at a higher pitch, such as in a typical feminine range, you'll likely want to transpose the notes up an octave, as they were made to work with a mid-ranged male singer. To do this in classic UTAU, hit CTRL+A to select all the notes, then navigate to Edit(E) > Move Region By Number(M), enter 12 into the box that pops up, and click OK. Now the notes will be in a better range for a typical female or higher-voiced UTAU.

Select the notes of the UST you want to play, or hit CTRL+A to select everything, then hit the play button in the top bar or SPACEBAR on your keyboard to render it and listen to the playback. If you've done everything correctly up until this point, it should sing the song with no problems.

Working In OpenUTAU
Inside of OpenUTAU, navigate to File > Open, and find the USTX file. Double click on it, or select it and hit Open. To select your voicebank, you'll first want to make sure that OpenUTAU is also looking in your UTAU voice folder for voicebanks. If this is set up correctly, the name of your voicebank should show up in the singer dropdown menu. If you want to change the render settings, click the gear icon, but as before, using the default resampler (in this case, WORLDLINE-R) is usually fine for testing, but may not sound as good for your voicebank as other resamplers.

Since this is a CV UST and we're using a CV voicebank, we shouldn't need to change the phonemizer, but if you want to select one just to make a habit of it, I'd recommend the JA VCV & CVVC phonemizer, since it's pretty universal, especially if you plan to add in either the initial CVs or the VCs and VVs to the voicebank later on.

If your voicebank is recorded at a higher pitch, such as in a typical feminine range, you'll likely want to transpose the notes up an octave, as they were made to work with a mid-ranged male singer. To do this in OpenUTAU, hit CTRL+A to select all the notes, then navigate to Batch Edits > Move an octave up. Now the notes will be in a better range for a typical female or higher-voiced UTAU.

Hit the play button at the top, or SPACEBAR on your keyboard, to play the UST. It may pause to render more of the track if it hasn't fully rendered before you hit play. Once again, if you've done everything correctly, it should sing without issue.

Common Problem with Using Premade USTs
If you load a UST and it's not playing, and you've already verified that your voicebank works correctly on the demo USTs, this almost certainly has to do with lyric input. Your voicebank is a Japanese CV voicebank that is aliased with kana characters. If you try to use it with a UST that is built for a different type of voicebank, it will not work correctly, because the lyric input must match the sample aliasing. This means it will not work correctly with USTs that are made for:

VCV voicebanks
CVVC voicebanks
Voicebanks with romaji aliasing
Voicebanks with alias suffixes

Because the samples the UST is telling UTAU to look for do not exist in your voicebank.

In OpenUTAU, this is usually a pretty easy fix, because there are built-in conversion tools under Batch Edits > Lyrics on the piano roll window. If the problem is that it's made for CVVC, you can usually just delete the VC notes and extend the CVs to fill the gap. Reading romaji lyric input is also not a problem if using the JA VCV & CVVC phonemizer.

If Batch Edits > Lyrics > Remove Tone Suffix doesn't work for cleaning up suffixes in the lyrics, you can go to Batch Edits > Lyrics > General lyrics replacement, type in the suffix you want to remove in the Before box, and hit Apply.

In UTAU, there is unfortunately no such easy fix already in the software, but there is a plugin we can use instead. Iroiro2 is a plugin with a lot of useful features, including lyric conversion. After you've downloaded and extracted the files into a subfolder within UTAU/plugins, open the UST, select all the notes, run the plugin, and select the conversion options you want to apply.

Finishing and Distribution
This step is completely optional — it's perfectly fine to make voicebanks you don't intend to release to the public, whether keeping them for personal use only or only distributing them privately to friends. However, most voicebank developers do plan for public releases, so it's good to know how to go about it.

File Cleanup
First, let's clean up our files. Throughout the configuration and testing process, we've likely ended up with a lot of files sitting in our voicebank that will bloat its size when we try to share it with others.

For starters, we can delete the vLabeler project file and cache file since our voicebank is fully oto'd now. We can also delete the $read file if one has been generated; this just tells UTAU that the voicebank has been opened before.

Next, inside of the folder with all of our audio files in it, we can delete all of the .frq, .llsm, and other such frequency files that may have been generated from voicebank usage or testing. Basically, you can safely delete any file in here besides the .wav files and the oto.ini.

Note said:
Depending on the resamplers you've been using, you may have a file in here named desc.mrq; this is fine to leave in the voicebank, as its storing sample frequency information in a single file rather than many separate files. It doesn't take up a lot of space and can also be converted to other frequency formats. If you have edited any of your frequency files and saved that information to the desc.mrq, you'll definitely want to leave it in there so that other users won't encounter the same frequency errors.

Voicebank Details
If you're at the point where you want to share your voicebank with others, you likely have made a character to go along with it, or at the very least a name you want to distribute it under. As such, we can rename our voicebank folder to reflect the name of the character, specific voicebank, and (optionally) version number, for example [CharacterName]_jpn_v1.0.

Next, let's render a short segment of a UST to function as our audio sample. This isn't required, but many users want to hear an example of what the voicebank sounds like in a raw render — that is, without any mixing or backing instrumental — so that's the purpose this audio file serves. Try to keep this no longer than about 30 seconds, and shorter is better.

You theoretically can use any song you want, but it's better to stick to things you know you have permission to use, including USTs made by other people. As such, I tend to stick to short, public domain songs rendered from USTs I've made myself, such as the demo USTs we used to test the voicebank. You have my permission to use them for your sample if you would like.

Export the .wav file from either UTAU or OpenUTAU and move it into the root folder of your voicebank. Name it something like sample.wav. This can be named anything, but must be a .wav file.

It's also common for voicebanks to come with an illustration of the character, or at the very least some image that will serve as the voicebank icon, though again this isn't required.

The voicebank icon must be 100 x 100 pixels, named something like icon.bmp, and placed inside the root folder. This can be named anything, but must be a .bmp file.

You can also include a full body illustration, reference sheet, or any other visuals you want; it's common to place these in a subfolder in your voicebank labeled something like art. Remember, though, the assumption of the user will likely be that if it is packaged along with the voicebank that they have permission to use it along with the voice, so don't include any art you don't want other people to use.

If you want to have a full body character portrait to display on the OpenUTAU piano roll, make sure the drawing has a transparent background and place it in either the root folder or a subfolder. Then, open the Singers window in OpenUTAU, load your voicebank, click on the gear icon next to location, and select Set Portrait to find the image. If the character icon isn't showing up in OpenUTAU, you can set it from this dropdown menu as well.

With these things made (or not), we can finish filling out the character.txt. The format for character.txt is like this:

Code:

name=Character Name
version=1.0
author=Name of the Developer
voice=Name of the Voice Provider
image=icon.bmp
sample=sample.wav
web=https://crouton.net/

All of these fields are optional to include.

The format for character.yaml, as used by OpenUTAU, is quite similar. All of these can be set within the Singers window, but here's what it looks like in text format:

Code:

singer_type: utau
text_file_encoding: shift_jis
image: icon.bmp
portrait: portrait.png
portrait_opacity: 0.67
portrait_height: 0
default_phonemizer: OpenUtau.Core.DefaultPhonemizer

Note the formatting differences between this and character.txt; character.yaml uses : after each field instead of =.

Lastly, let's talk about the readme.txt. This file should ideally contain information on how to credit and contact you, the author, as well as relevant technical information such as the type of voicebank, language support, recording pitch, reclist used (if non-standard), and any special features the voicebank has. It can also contain character information, such as a short character bio.

Perhaps most importantly, however, this is where you should put your Terms of Use (TOU). Essentially, this is an informal contract between you and the user that outlines what is and isn't allowed to be done with the voicebank itself and any associated assets like character art. These are difficult to have any sort of legal enforcement outside the realm of unauthorized commercial usage, but most users will be respectful of voicebank developers' requests.

My friend KLAD has a really great TOU Generator on his website that you can use if you don't know where to start. You can also reference mine if you want to see a slightly different format.

Putting it altogether, we should have a root folder that looks something like this:

And our voicebank's in-engine profiles will look like this:

Pressing the "sample" button in either software should play the audio sample we specified in character.txt. If none is specified, it will play a random .wav file from the voicebank.

Packaging
With all of this set up, our voicebank is now ready to be set free into the world! Well, almost.

First, we need to package it. We've already included everything we need, so this is as simple as right clicking on the root folder within the voice folder and compressing it to either a .zip or a .rar file.

Next, upload the compressed file to some kind of filesharing service, like Google Drive, Mega.nz, Bowlroll, Mediafire, Dropbox, or any other one of these free-to-use services. This will provide you with a link you can give to anyone who wants to download your voicebank. My only recommendation here is to try to avoid the kind of free filesharing sites that will give you and others 5000 viruses, usually proportional to how many advertizements its flashing at you.

Distribution
How you go about it from here on out is up to you.

You can make a demo song/cover, or demo reel of multiple songs/covers, showing off your UTAU's capabilities, which you can upload to Youtube or Soundcloud of where ever else.

You can create pages for your UTAU on different voicebank databases, like the UTAU wiki, for example, so that people searching for voicebanks can potentially find it. You can even make your own webpage for your UTAU, whether coded from scratch like mine are here on Neocities or made by using a free website builder. Then, you can link this in the "web" section of the character.txt.

You can also simply post about your UTAU on your social media accounts, or advertize it on forums and discords that allow for self-promotion — though, remember to be courtious to other users; if you're posting in a social space, it's a little rude to just spam your promos and dip.

These are just a few ideas; a lot of the time, word about voicebanks spreads by word-of-mouth or by chance, so just keep doing your thing and eventually other people will start to take notice.

Next Steps
Thanks for reading this walkthrough! It's a bit of a long one, but I hope you've come away from it with a better understanding of how UTAU voicebanks are made and, if you've followed along with me, succesfully created a voicebank of your own.

Using the foundation of everything we've covered here, you can go on to create many more voicebanks, and you can always come back and reference this page if you need a refresher on how something works.

Once you feel comfortable recording, configuring, and working with monopitch Japanese CV voicebanks, you can move on to creating and using more complex voicebank types, such as CVVC voicebanks, VCV voicebanks, multipitch voicebanks, and even branch into more complex languages like English.

I'm also working on some more in-depth tutorials on recording and otoing strategies aimed at intermediate and advanced users, as well as one on UST creation and other troubleshooting guides. I will link those here when they're done.

If you've successfully rendered, mixed, and uploaded a demo of a voicebank created using this walkthrough and you want to include your UTAU in my voicebank directory, see here.

If you need help with anything else, or just want a beginner-friendly place to chat about vocal synth stuff, I recommend posting to this forum or the St. Defoko's Discord Server.

Happy UTAU-ing :sing:

Search

How to Create Your First Voicebank: Part 2

More resources from FelineWasteland

Share this resource