Hey all, was curious to know more about voicebanks that aren't voiced by humans (e.t. Defoko, Adachi Rei etc). How are voicebanks like this created?
This depends on where exactly the voice comes from I think? For instance Defoko's voice came from a TTS software (AquesTalk), so samples were just taken from whatever output the TTS software produced and configured into a voicebank (where the TTS' voice itself came from, however, I'm not too sure as all I know is that it's not from a human).
As for Rei, her voice came from what was essentially digital noises that didn't resemble human speech whatsoever (unlike Defoko's TTS-based samples) that were eventually modified in Audacity via effects and other stuff to make them resemble Japanese syllables.
Each non-human-voiced voicebank was probably made using different techniques from each other due to how different their very sound sources are, so there's not necessarily one answer to your question if that makes sense? I've heard of some voicebanks being made by only producing sounds that are similar to vowels before having consonants spliced onto the samples, so that's certainly another method than whatever I mentioned above! Hoping all of this makes sense since I'm not really the greatest at wording (oops)