I've seen some fairly good Diffsingers with around 10 minutes of data, and I know there's some way to do XLS so you wouldn't even need to record in Japanese. If you really want Japanese data, and a lot of it, you could record for a voice conversion AI such as RVC or Diff-SVC first and use that to get singing data instead (with the slight loss of a more "personalized" singing style, but being specific about what audio you convert could probably minimize that). I don't know what the quality of Diffsinger models made with voice conversion tends to be, but I know it's a recommended method on the Diffsinger Google doc.
If you use the voice conversion method, there are probably some restrictions and/or possible legal concerns with the audio you're allowed to use, but I'm not sure what they'd be. If you have Discord, I think there's still an official(?) Diffsinger server you could ask in if you can't get an answer.