How does one create a vocal synth engine ?

MANIAGIRLKITTY

Ruko's Ruffians
Defender of Defoko
Hello so lately I have been asking a lot of stuff about UAR files and other stuff but that's all for one big cause . that I might will explain later .
But to let me get to the point .
A couple of months ago I created a utau called HIME , she was supposed to be a plugin for the vocal synth ALTER/EGO but due to problems this sadly enough didn't happen .
That's why we of PROJECT X.I are planning on releasing a vocal synth engine made for the Belgian anime otaku community.

BUT WE DON'T KNOW HOW TO START OR MAKE THIS !!!

So if you know how or you know a person who might be able to help us , PLEASE HELP US create this , if you helped us after this there might be a possibility of your vb becoming a vb who will be on sale for the engine to!!

Kind regards
Mania and the team of PROJECT X.I
 
Last edited:
  • Like
Reactions: sailor _ravioli

DvoraKing

Teto's Territory
Creating a voice synth engine is not something trivial. It requires vast amounts of coding knowledge, mathematics, and a pretty good idea of how the human voice (and acoustics in general) work.

supposed to be a vocaloid for the company ALTER/EGO

Vocaloid and Alter/Ego are distinct.
They are not related in any way. Furthermore, A/E is not a company, it is a voice synthesis engine created by Plogue.
 

MANIAGIRLKITTY

Ruko's Ruffians
Defender of Defoko
Thread starter
Creating a voice synth engine is not something trivial. It requires vast amounts of coding knowledge, mathematics, and a pretty good idea of how the human voice (and acoustics in general) work.



Vocaloid and Alter/Ego are distinct.
They are not related in any way. Furthermore, A/E is not a company, it is a voice synthesis engine created by Plogue.
Okay thanks for the explanation
I'll change the treath and thank you for showing interest. We know how the human voice works , we had to change HIME'S voice 3 times for getting a good result tho
 

DvoraKing

Teto's Territory
We know how the human voice works , we had to change HIME'S voice 3 times for getting a good result tho
There's much more to the actual function (and replication/manipulation) of a voice than just "it sounds good/bad." The human voice is actually very complicated.

Also keep in mind creating a synthesized voice using a platform/engine that is already established is drastically different than creating an engine itself.

There's no need for one anyway. There are many resampler options for UTAU, and I'm sure you'll be able to find one that suits your voices quite well if you look hard enough.
 

KNΞMΛTCS

Just an UtaForum user
Defender of Defoko
I second the fact that creating an engine on your own is not for the faint of heart - better have a dedicated team of pros to do this one. But if you know what you're doing (or don't mind doing copious amounts of learning), modifying one of the open-source WORLD based resamplers (w4u, the efg series) could be a possibility. If you still insist on making a new engine, here's what you'll have to do:
  • Trim the sample, according to whatever oto.ini alternative you conceive.
  • Shift the pitch of the sample. To do this, you need to figure out the mean pitch of the original sample to work off of. You also have to incorporate pitchbends here.
  • Apply a formant filter. The maths behind this one are above my head, but it essentially keeps away the "chipmunk effect".
  • Blend all those samples together smoothly. This is done by wavtool in Utau, and you need to overlap the samples otherwise you'll end up with choppyness.
  • Do all the above smoothly and effectively. I'm sure you could just use an audio SDK of some sort to implement the above, but chances are it will sound horrible. Getting acceptable results will be the big battle for sure.
If you know anyone who's ever programmed a software synthesizer or a VST plugin before, they would be a huge help since they've worked with audio before.

Of course, that's just how you would construct an Utau-like vocal synthesizer, I'm sure there are other ways to do things. You could even use total synthesis (not use samples), which would probably sound really robotic but could have potential for FX and such.
 

수연 <Suyeon>

Your friendly neighborhood koreaboo trash
Supporter
Defender of Defoko
First thing's first... creating a synth requires time, effort, and professional training - these are professional tools, not toys for weebs who want to indulge their "famous vocasynth desu!" fantasies. Every other attempt at a synth that has come around thus far has never gotten past working alpha/beta stage (either due to lack of funds, skill, interest, time, or all of the above). Many start off with less than stellar quality compared to their established competition and eventually fade into obscurity. The only exceptions I can think of are the obvious: UTAU, VOCALOID, A/E, Chip Speech, Cevio, moresampler (though this is an engine for UTAU, not it's own full program with UI, etc.), Revivos, and maybe sharpkey - time will tell for this last one (so long as it keeps getting updates and adding features).

UTAU is already established, free to use (though it is shareware, so people who can pay should do so), and has plenty of resources provided by the community:
- artists (aural and visual) are plentiful - if you have money
- programmers and Q/A testing can be provided by volunteers
- reclists can be established/updated to better standards for French, German, and Dutch
- advice can be gotten on most things (just don't ask about mics - this usually doesn't go well, everyone defends their particular mic like it's religion)
- plugins to get around hiccups in vanilla (non-shareware) UTAU (the same can not be said for UTAU-Synth, unfortunately).

You have all the tools you need to kick off your project without making a synth from scratch. There are plenty of people here who can teach you and your friends how to use UTAU effectively. What you won't find here are people who can teach you how to make a good synth if they haven't a Masters or Ph.D in all the relevant topics required for that kind of task.
 

MANIAGIRLKITTY

Ruko's Ruffians
Defender of Defoko
Thread starter
Thanks for all the comments yet , but the point is that we don't want to create something in utau and a lot of you told us to use utau , but we did thought of just stay with utau because its the mainstay here . we know programmers we have 2 of them in our team , but vocal synth is like a complete now level . BUT WE WANT TO TRY this doesn't mean that we will not use utau anymore , home just sounded awful in utau and we wanted to have the clearness of vocaloid
 

수연 <Suyeon>

Your friendly neighborhood koreaboo trash
Supporter
Defender of Defoko
hime just sounded awful in utau and we wanted to have the clearness of vocaloid

UTAU's quality is relatively considered on par with Vocaloid, though YMMV. Even AHS were willing to make the voices of Tohoku Zunko's sisters in UTAU - which is to say, it can be a tool to take serious consideration of when in the right hands.

Quality is down to multiple factors, but the following things I would look at are...

- What mic are you using - if any? Mics aren't one-size-fits-all and a mic suggested for one person might not sound so good for another person. Anything halfway suitable will run you at least $70*. Past $100 and bigger budgets, I would look at XLR options. The saying often goes: Trash in, trash out. If your mic is poor, then the quality will be poor.
*mics do decrease in price over time, so it might be good to get a model a year late or wait til sales.

- In relation to the mic is the environment a voice is recorded in. Studio quality is impossible/impractical to ask for, but there are ways to make a bedroom a little more suited to recording.

- The programming. A higher quality mic and well done samples are nothing without being programmed well. This is difficult for a lot of people to do well and often takes years of trial and error (some methods are easier than others, such as VCV, but this method can't be used for everything; French, Dutch, and German would be best done in CVVC), but it is most beneficial if you're looking for the best result you can get. Even a lesser quality bank like defoko can sound pleasing to the ear when her flaws are ironed out (add silence before and after sounds, fix the oto; it's a synthesized voice from the outset, so a hyper-realistic sound isn't going to happen, but some people like her artificial timbre).
 

MANIAGIRLKITTY

Ruko's Ruffians
Defender of Defoko
Thread starter
UTAU's quality is relatively considered on par with Vocaloid, though YMMV. Even AHS were willing to make the voices of Tohoku Zunko's sisters in UTAU - which is to say, it can be a tool to take serious consideration of when in the right hands.

Quality is down to multiple factors, but the following things I would look at are...

- What mic are you using - if any? Mics aren't one-size-fits-all and a mic suggested for one person might not sound so good for another person. Anything halfway suitable will run you at least $70*. Past $100 and bigger budgets, I would look at XLR options. The saying often goes: Trash in, trash out. If your mic is poor, then the quality will be poor.
*mics do decrease in price over time, so it might be good to get a model a year late or wait til sales.

- In relation to the mic is the environment a voice is recorded in. Studio quality is impossible/impractical to ask for, but there are ways to make a bedroom a little more suited to recording.

- The programming. A higher quality mic and well done samples are nothing without being programmed well. This is difficult for a lot of people to do well and often takes years of trial and error (some methods are easier than others, such as VCV, but this method can't be used for everything; French, Dutch, and German would be best done in CVVC), but it is most beneficial if you're looking for the best result you can get. Even a lesser quality bank like defoko can sound pleasing to the ear when her flaws are ironed out (add silence before and after sounds, fix the oto; it's a synthesized voice from the outset, so a hyper-realistic sound isn't going to happen, but some people like her artificial timbre).
We'll actually HIME will be recorded in a studio because she was mend to become a voice plugin for alter ego but this didn't happen so we will continue to make her satisfying enough for other program . so I think the quality will be high enough .
We have a reclist in dutch that is close to the English vocaloid one . so it will contain some full words, for japanis it will be the simple CV reclist with extra sounds.
If our original idea cannot become reality (the engine one) than HIME will become a UAR that we will probably will sell for a decent cheap price .
The reason why we asked a studio is because ( normally we record in my house at home were I have my studio) and we wanted to make it professional .
So yeah
 

HoneyPai

Defoko's Slaves
Defender of Defoko
This is no help with a new engine, but more as studio advice
To get better results, make sure the vp has experience in recording in a professional environment and I'd even throw in vocal training experience, I do not mean a home studio. I mean real deal professional experience
Otherwise, it could go downhill and you will have wasted time and money on the studio
 
  • Like
Reactions: Nohkara

na4a4a

Outwardly Opinionated and Harshly Critical
Supporter
Defender of Defoko
There are so many engines (resamplers) already available. Do we really need any more?

A voicebank will only be as good as it's samples. If you record with high end gear then your voicebank will reflect this.

There are already many toolkits available that can shift a voice up an down in pitch (in a basic sense), you would then need to implement some form of time stretching. Then boom you have a "basic" resampler.
You'll need to know how to code fairly well in order to pull anything off of course.

Your best bet would be to work with what is already available and spend more time on the actual samples.
 

MANIAGIRLKITTY

Ruko's Ruffians
Defender of Defoko
Thread starter
This is no help with a new engine, but more as studio advice
To get better results, make sure the vp has experience in recording in a professional environment and I'd even throw in vocal training experience, I do not mean a home studio. I mean real deal professional experience
Otherwise, it could go downhill and you will have wasted time and money on the studio
The utau will be voiced by me. Our team exists out of voice actors ,singers , artists and musicians . so don't worry about that :3 .
We also knew that there was a different between homedstudio and a musicstudio one . my studio at home is provided with a isolation that can give me an echo for my opera covers . but its not necessary for the HIME utua . also the old hime tries for utau were recorded with my home studio . and I'm glad that you started of with the vocals experience your the first to ask or comment about ...but please don't worry haah I know what we are doing in this project.
We had a luck that we knew people who had studio's , we knew even several people but we took the one that we were sure of the quality
 

KNΞMΛTCS

Just an UtaForum user
Defender of Defoko
The utau will be voiced by me. Our team exists out of voice actors ,singers , artists and musicians . so don't worry about that :3 .
We also knew that there was a different between homedstudio and a musicstudio one . my studio at home is provided with a isolation that can give me an echo for my opera covers . but its not necessary for the HIME utua . also the old hime tries for utau were recorded with my home studio . and I'm glad that you started of with the vocals experience your the first to ask or comment about ...but please don't worry haah I know what we are doing in this project.
We had a luck that we knew people who had studio's , we knew even several people but we took the one that we were sure of the quality
What you're looking for is a full-time, professional voice actor or singer. That's where you should put your money, a better studio will mean nothing.
 

na4a4a

Outwardly Opinionated and Harshly Critical
Supporter
Defender of Defoko
In terms of Utau, the requirements to voice aren't the huge. You should be able to keep a clear, strong, consistent tone. It's a lot less then say...text to speech voices which require professional voice actors who can record thousands of lines for hours at a time.

You could also have a great voicer but crappy gear and it would still be...well...crap.
Find a balance that works for you.
 

MANIAGIRLKITTY

Ruko's Ruffians
Defender of Defoko
Thread starter
In terms of Utau, the requirements to voice aren't the huge. You should be able to keep a clear, strong, consistent tone. It's a lot less then say...text to speech voices which require professional voice actors who can record thousands of lines for hours at a time.

You could also have a great voicer but crappy gear and it would still be...well...crap.
Find a balance that works for you.
I like your interest but we are happy with the voice it has now , it is reconised so yeah ..finding a voice actor in Belgium , that has nothing to do with are project feels weird , cuz that what the project is about musical voice and art . it would be bothersome to change or channel the work we put in it . okay
 

DvoraKing

Teto's Territory
Thanks for all the comments yet , but the point is that we don't want to create something in utau and a lot of you told us to use utau , but we did thought of just stay with utau because its the mainstay here . we know programmers we have 2 of them in our team , but vocal synth is like a complete now level . BUT WE WANT TO TRY this doesn't mean that we will not use utau anymore , home just sounded awful in utau and we wanted to have the clearness of vocaloid

I don't understand. You want the clearness of Vocaloid, so you're intending to create a new engine from scratch? You realize Vocaloid has decades of research and development behind it, right?

Also, keep in mind Vocaloid is a single engine. A voice that sounds fine using one of the many resamplers available for UTAU may not sound fine on Vocaloid because of that reason.
 
  • Like
Reactions: Kiyoteru

MANIAGIRLKITTY

Ruko's Ruffians
Defender of Defoko
Thread starter
I don't understand. You want the clearness of Vocaloid, so you're intending to create a new engine from scratch? You realize Vocaloid has decades of research and development behind it, right?

Also, keep in mind Vocaloid is a single engine. A voice that sounds fine using one of the many resamplers available for UTAU may not sound fine on Vocaloid because of that reason.
Well this idea was pretty stupid and we have no interest anymore so please , also we didn't want to make an engine just for the clearness of the voice . if we just did it for the clearness would it just had been an utau , we wanted to make an engine for Belgium users . but enough about it we might just found another solution . Its starting to become stress full and it don't won't to create a confusion with this treath.
 

Kiyohime

Ruko's Ruffians
Defender of Defoko
You're Belgian? I have never seen you around I think? Correct me if I am wrong.

Anyway can you keep me up to date with this protect?
 

MANIAGIRLKITTY

Ruko's Ruffians
Defender of Defoko
Thread starter
You're Belgian? I have never seen you around I think? Correct me if I am wrong.

Anyway can you keep me up to date with this protect?
Well we are not planning to create a synth from scratch but to use an other then utau . thanks for showing interest anyways .
 

Similar threads