Universal Phoneme Set Converter?

socialgutbrain777 · Aug 25, 2019

You can probably already do this with IroIro, but I'm talking about a plugin that can change phonemes to completely different ones, such as hiragana or romaji to something like X-SAMPA, of course having to specify the phoneme set you're converting from, including the plugin having to make substitutes for phonemes that aren't included in one of the phoneme sets. Does anyone know if a plugin like this exists? If this thread is irrelevant or seems annoying/spam-y, I'll take it down.

Kiyoteru · Aug 26, 2019

One of the problems with something like that is that you'd essentially have to customize the dictionary for the specific reclist format in the original UST and in the target encoding. Converters are common for methods like CV to VCV because the format of the lyrics is already standardized across the community, but a universal converter would have the unwieldy task of handling EVERY possible method.

The other problem, and arguably the more difficult one, is note division. Especially once you start using non-Japanese voicebanks, phonemes per syllable will be divided in different ways for different types of reclists. For example, one voicebank may have [sta] while another may have [s t][ta]. The plugin would have to be equipped to handle every such case of splitting and combining when converting between formats: be able to intelligently read the lyrics and determine the correct splitting point, not get confused by prefixes and suffixes, recognize when certain sequences of notes are supposed to be combined, and so on.

socialgutbrain777 · Aug 26, 2019

Kiyoteru said:
The plugin would have to be equipped to handle every such case of splitting and combining when converting between formats: be able to intelligently read the lyrics and determine the correct splitting point, not get confused by prefixes and suffixes, recognize when certain sequences of notes are supposed to be combined, and so on.

Ah. Now I realize that logic hole. That would be a nightmare to code, yikes :^D

Kiyoteru · Aug 26, 2019

Don't give up just yet. Though it seems like a huge challenge, I'd love for people to try and take it on!
As a basis for the plugin, the source code for context-n can be copied: https://github.com/adlez27/context-n/blob/master/context-n.py
Dictionaries could exist in correspondence between specific reclist formats (or to the individual voicebank, in the case of context-n's config files) and a universal encoding internal to the plugin: ideally narrow-transcription IPA for the greatest amount of specificity.
Every plugin-internal phoneme could have its own sub-list of every other phoneme in decreasing order of similarity, for substitution purposes.
Users would specify the source encoding, and the voicebank's config would specify the target encoding.

With that version of the plugin up and running, most of the work for the end-user is done, and it's up to them to manually join and split notes. Then attention can be focused on automating that part too!

After that, the plugin could be updated to automatically recognize the format of the UST instead of the user having to do so.

Ganbatte to anyone interested in making this plugin!

socialgutbrain777 · Aug 26, 2019

Kiyoteru said:
Don't give up just yet. Though it seems like a huge challenge, I'd love for people to try and take it on!
As a basis for the plugin, the source code for context-n can be copied: https://github.com/adlez27/context-n/blob/master/context-n.py
Dictionaries could exist in correspondence between specific reclist formats (or to the individual voicebank, in the case of context-n's config files) and a universal encoding internal to the plugin: ideally narrow-transcription IPA for the greatest amount of specificity.
Every plugin-internal phoneme could have its own sub-list of every other phoneme in decreasing order of similarity, for substitution purposes.
Users would specify the source encoding, and the voicebank's config would specify the target encoding.

With that version of the plugin up and running, most of the work for the end-user is done, and it's up to them to manually join and split notes. Then attention can be focused on automating that part too!

After that, the plugin could be updated to automatically recognize the format of the UST instead of the user having to do so.

Ganbatte to anyone interested in making this plugin!

Thank you very much! This idea randomly came to me because someone might want to make a VB sing Japanese that doesn't have Hiragana/Romaji phonemes, and I wondered about an easy way to do that. I'll try looking into Python for potentially scripting some plugins as it's more term specific. I'm also maybe considering learning Java, which I've seen do some neat stuff IMO (for example, Shimeji-ee, which spawns little guys on your desktop.)

That being said, I'd probably make this more with LITE voicebanks in mind, but yeah, if I try taking some courses for Python or Java and figure out how exactly a plugin is opened in UTAU, I'll try my hand at this!

Kiyoteru · Aug 26, 2019

socialgutbrain777 said:
igure out how exactly a plugin is opened in UTAU

A plugin takes one argument, which is the path to a temporary UST file. You can see the contents of the temporary UST more easily using the sample plugin included with UTAU when you first installed it. It usually contains only the notes in the selection (and one before and after), but when all notes in the UST are selected, the temporary UST resembles an ordinary UST without the track ending marker. Generally, the temp UST adds some useful additional information (such as the exact path to the voicebank folder, and the specific alias being called by each lyric after prefix mapping). UST files are just INI files, so the easiest way to read and write them is to use some kind of library for handling INI files and making them into arrays and dictionaries.
Once the plugin finishes executing, UTAU will merge the temporary UST back into the main file.
There's some additional information in this thread: https://utaforum.net/threads/utau-plugin-basic-tutorial.18727/

socialgutbrain777 · Aug 27, 2019

Kiyoteru said:
A plugin takes one argument, which is the path to a temporary UST file. You can see the contents of the temporary UST more easily using the sample plugin included with UTAU when you first installed it. It usually contains only the notes in the selection (and one before and after), but when all notes in the UST are selected, the temporary UST resembles an ordinary UST without the track ending marker. Generally, the temp UST adds some useful additional information (such as the exact path to the voicebank folder, and the specific alias being called by each lyric after prefix mapping). UST files are just INI files, so the easiest way to read and write them is to use some kind of library for handling INI files and making them into arrays and dictionaries.
Once the plugin finishes executing, UTAU will merge the temporary UST back into the main file.
There's some additional information in this thread: https://utaforum.net/threads/utau-plugin-basic-tutorial.18727/

this is super fascinating! So, that one weird plugin when you install UTAU functions as a sort of guide for plugin-making? I had no idea! Thank you for linking me to this info! It should really help.

Title	Forum	Replies	Date
Tutorial Universal Harmony: A Cross-Compatibility Guide	Tutorials & Resource Directory	0	Jul 1, 2016
Utau only able to use 1 phoneme per note	UtaHelp	3	Mar 22, 2025
How would I add another space to put a phoneme?	UtaHelp	1	Mar 18, 2025
Does anyone know what How can you make phoneme transitions more natural in Niaoniao?	Multiple / Other Voice Synths	1	Nov 10, 2024

Search

Universal Phoneme Set Converter?

socialgutbrain777

Teto's Territory

Kiyoteru

UtaForum power user

socialgutbrain777

Teto's Territory

Kiyoteru

UtaForum power user

socialgutbrain777

Teto's Territory

Kiyoteru

UtaForum power user

socialgutbrain777

Teto's Territory

Similar threads