With my experience in RVC, EQing and compressing the dataset can result in more clear and powerful results, but even going a bit overboard can screw everything up. I’ve found a nice balance by adding a ProQ (Bright Vocals preset, it makes it clearer but also isn’t too intense), and TDR Kotelnikov and TubeTech compressors to make it sound more powerful and loud.
Are we intending on cleaning the datasets or leaving them as-is? Also, I think the 200 voices at release is a far stretch and, considering the size of NNSVS and DIFFSINGER, would lead to the program being 150+ gigabytes large. In my opinion, it would be easier to settle with 7-10 voicebanks on release and gradually update them. 200 could easily overwhelm new users, plus they would sound subpar if we’re aiming for a Q4 2025 or Q1 2026 release date, as that’s not enough time to train them well enough.
Are we intending on cleaning the datasets or leaving them as-is? Also, I think the 200 voices at release is a far stretch and, considering the size of NNSVS and DIFFSINGER, would lead to the program being 150+ gigabytes large. In my opinion, it would be easier to settle with 7-10 voicebanks on release and gradually update them. 200 could easily overwhelm new users, plus they would sound subpar if we’re aiming for a Q4 2025 or Q1 2026 release date, as that’s not enough time to train them well enough.