This tutorial is crossposted from my website. It assumes at least a basic understanding of UTAU and OREMO.
Writing the BGM
The first step to making the guideBGM is, unsurprisingly, to make the BGM itself. You can use any DAW that will let you export .wav files.
There aren't really any restrictions of what the BGM can be, but remember that the purpose is for the singer to match pitch and tempo, like using a tuning fork and metronome at the same time. In other words, it should have a clear pulse for syllable/mora timing, and the target pitch should be easy to parse.
Yours does not have to be as bare-bones as mine are, but personally I find that the more "musical" BGMs are harder for me to follow, or become annoying to listen to over time. Others may find that BGMs with more layers or complexity make it easier to feel like they're singing a song.
It's probably best to build the BGM as a MIDI, for two reasons: 1) it will ensure the pitch and tempo are exact, and 2) it will make it much easier to export it at a variety pitches and tempos. The second factor may not apply if you're creating one for a specific purpose rather than general use, but nonetheless it's good to keep in mind.
Here's some general tips:
Syllable Timing
The most convenient tempos to use are 100 BPM, 120 BPM, and 150 BPM. The reason is because the duration of each beat plays an important role in setting up the BGM configuration file as well as in configuring the oto of any voicebank recorded with it, and these three tempos give us easy numbers to work with.
100 BPM in half time = 50 BPM in normal time, 120 in half time = 60 in normal time, and so on.
We don't typically want syllables to be held less than 400 msec because that gives us hardly any vowel to work with and can make it difficult to articulate consonants. Syllables more than ~2000 msec long are more prone to irregularities in the vowel and just really aren't necessary.
As far as pitch goes, this will simply be dependent on the intended pitch of the recordings.
If you plan to export the same BGM at multiple pitches and tempos, use a folder and filename system that makes it easy to tell which .wav file is for which combination.
Writing the Configuration File
The configuration file will be a .txt file with the same name as its corresponding .wav file. If the names aren't identical, OREMO and RecStar won't know what BGM it goes with.
The format is very simple. The first line of the config will declare one of three units of measurement: msec, sec, or sample. I recommend using milliseconds, as this is what the oto uses, and means you won't have to mess with decimals. (Admittedly, I don't know what is meant here by "samples"; I haven't experimented with that unit). Be sure not to include whitespace in this declaration.
Below that, there will be one empty line, followed by the rest of the config. The format for these lines is as follows:
The event number will start with 1 and increase numerically for as many lines as there are in the config.
The timestamp is the position at which the event will occur. Playback of the BGM will always start at the timestamp declared in line 1. Make sure that each timestamp is larger than the one before it to prevent recording errors.
The next four numbers are binary variables that tell the software what type of event will occur at that position. These will only ever be 0 or 1, and only one event of each type should occur per BGM, otherwise it can cause errors.
The repeat BGM event should go on the same line as the move to next string event.
The comment is what will be displayed at the bottom of the screen in OREMO when each event is occurring; this is not to be confused with a comment line headed by #.
At minimum, a BGM config will have 2 lines (plus the unit declaration):
But it's better to separate the events to help with pacing and prevent recording errors. So, a true minimalist approach to a BGM should look more like this:
To include "empty" events — as in, events which only display comments and don't do anything else — simply set every event value to 0. These can go anywhere past the first line and before the move to next string event, but remember to number them as well.
Additionally, the repeat event line does not have to go back to line 1 if, for example, your BGM has an intro that you don't want to repeat every time. Likewise, the timestamp of line 1 does not have to be 0 if you don't want playback to start at the beginning of the .wav file.
Here's an example of a BGM made for a 4-beat utterance at 120 BPM:
And for an example of an actual config file made for one of my BGMs:
Double check your BGM carefully for typos, since OREMO does not have a lot of failsafes in place for errors, and always test them yourself before distribution to make sure the timing works as intended.
One final note: if you are using OREMO and you have Japanese characters in your comments, remember to encode the config files in Shift JIS so that OREMO will display them correctly.
Writing the BGM
The first step to making the guideBGM is, unsurprisingly, to make the BGM itself. You can use any DAW that will let you export .wav files.
There aren't really any restrictions of what the BGM can be, but remember that the purpose is for the singer to match pitch and tempo, like using a tuning fork and metronome at the same time. In other words, it should have a clear pulse for syllable/mora timing, and the target pitch should be easy to parse.
Yours does not have to be as bare-bones as mine are, but personally I find that the more "musical" BGMs are harder for me to follow, or become annoying to listen to over time. Others may find that BGMs with more layers or complexity make it easier to feel like they're singing a song.
It's probably best to build the BGM as a MIDI, for two reasons: 1) it will ensure the pitch and tempo are exact, and 2) it will make it much easier to export it at a variety pitches and tempos. The second factor may not apply if you're creating one for a specific purpose rather than general use, but nonetheless it's good to keep in mind.
Here's some general tips:
- You (or whoever is using the BGM) will be listening to it dozens if not hundreds of times; try not to write anything that you'll find irritating to hear over and over.
- Quarter notes in 4/4 time are usually the easiest to count. It can also be helpful to have an obvious downbeat (first beat of each measure).
- While you don't have to keep all instrument voices in monotone, remember to write in the key of the target pitch. Also, harmonies can be distracting and might cause the singer to match to the wrong pitch, so keep the main instrument voice solo.
- Keep in mind the string lengths that the BGM will be used to record; you don't typically need an eight bar tune (32 beats) since most reclist strings are only about 1 to 10 syllables/morae long.
- Make sure to leave some "padding" on either side of the expected recording area to ensure none of the utterance is cut off. One or two measures before and after the expected utterance is plenty. This also gives the vocalist time to read the new string, inhale, and get ready to start singing it; the utterance should not be expected to start on beat 1 of measure 1.
Syllable Timing
The most convenient tempos to use are 100 BPM, 120 BPM, and 150 BPM. The reason is because the duration of each beat plays an important role in setting up the BGM configuration file as well as in configuring the oto of any voicebank recorded with it, and these three tempos give us easy numbers to work with.
100 BPM | 120 BPM | 150 BPM | |
---|---|---|---|
Duration of 1 beat (normal time) | 600 msec | 500 msec | 400 msec |
Duration of 2 beats (half time) | 1200 msec | 1000 msec | 800 msec |
Duration of 4 beats (quarter time) | 2400 msec | 2000 msec | 1600 msec |
100 BPM in half time = 50 BPM in normal time, 120 in half time = 60 in normal time, and so on.
We don't typically want syllables to be held less than 400 msec because that gives us hardly any vowel to work with and can make it difficult to articulate consonants. Syllables more than ~2000 msec long are more prone to irregularities in the vowel and just really aren't necessary.
As far as pitch goes, this will simply be dependent on the intended pitch of the recordings.
If you plan to export the same BGM at multiple pitches and tempos, use a folder and filename system that makes it easy to tell which .wav file is for which combination.
Writing the Configuration File
The configuration file will be a .txt file with the same name as its corresponding .wav file. If the names aren't identical, OREMO and RecStar won't know what BGM it goes with.
The format is very simple. The first line of the config will declare one of three units of measurement: msec, sec, or sample. I recommend using milliseconds, as this is what the oto uses, and means you won't have to mess with decimals. (Admittedly, I don't know what is meant here by "samples"; I haven't experimented with that unit). Be sure not to include whitespace in this declaration.
Below that, there will be one empty line, followed by the rest of the config. The format for these lines is as follows:
Code:
[event number],[timestamp],[start recording],[stop recording],[move to next string],[repeat BGM],[comment]
The event number will start with 1 and increase numerically for as many lines as there are in the config.
The timestamp is the position at which the event will occur. Playback of the BGM will always start at the timestamp declared in line 1. Make sure that each timestamp is larger than the one before it to prevent recording errors.
Note said:
The next four numbers are binary variables that tell the software what type of event will occur at that position. These will only ever be 0 or 1, and only one event of each type should occur per BGM, otherwise it can cause errors.
The repeat BGM event should go on the same line as the move to next string event.
The comment is what will be displayed at the bottom of the screen in OREMO when each event is occurring; this is not to be confused with a comment line headed by #.
At minimum, a BGM config will have 2 lines (plus the unit declaration):
Code:
msec
# ↑ This is where the unit declaration will go.
# ← Indicates a comment line that will not affect the config.
# ↓ Put an empty line here.
1,0,1,0,0,0,Playback will start at the beginning of the .wav file and recording will begin immediately.
2,####,0,1,1,1,Recording and playback will stop at the specified position. The software will immediately move on to the next string and play the BGM again.
But it's better to separate the events to help with pacing and prevent recording errors. So, a true minimalist approach to a BGM should look more like this:
Code:
1,0,0,0,0,0,Playback will start at the beginning of the .wav file.
2,####,1,0,0,0,Recording will start at the specified position after playback begins.
3,####,0,1,0,0,Recording will stop at the specified position before playback ends.
4,####,0,0,1,1,Playback will end at the specified position. The software will move on to the next string and play the BGM again.
To include "empty" events — as in, events which only display comments and don't do anything else — simply set every event value to 0. These can go anywhere past the first line and before the move to next string event, but remember to number them as well.
Additionally, the repeat event line does not have to go back to line 1 if, for example, your BGM has an intro that you don't want to repeat every time. Likewise, the timestamp of line 1 does not have to be 0 if you don't want playback to start at the beginning of the .wav file.
Here's an example of a BGM made for a 4-beat utterance at 120 BPM:
Code:
msec
1,0,0,0,0,0,Playback will start at the beginning of the .wav file.
2,1000,1,0,0,0,Recording starts 2 beats into the BGM. Remember that at this tempo 1 beat = 500 msec.
3,2000,0,0,0,0,This is a comment-only event marking the start of the utterance.
4,5000,0,1,0,0,Recording stops 2 beats after the utterance and 2 beats before the end of the BGM.
5,6000,0,0,1,1,The BGM ends at 12 beats (which in this case is the entire .wav file) then moves on to the next string and repeats.
And for an example of an actual config file made for one of my BGMs:
Code:
msec
1,0,0,0,0,0,Playback start || 再生始め
2,1000,1,0,0,0,Recording start || 録音始め
3,1500,0,0,0,0,Get ready... || せーの…
4,2000,0,0,0,0,Utterance start || 発声始め
5,4000,0,0,0,0,Utterance end || 発声終わり
6,5000,0,1,0,0,Recording end || 録音終わり
7,6000,0,0,1,1,Playback end || 再生終わり
Double check your BGM carefully for typos, since OREMO does not have a lot of failsafes in place for errors, and always test them yourself before distribution to make sure the timing works as intended.
One final note: if you are using OREMO and you have Japanese characters in your comments, remember to encode the config files in Shift JIS so that OREMO will display them correctly.