Arpasing assistant would not be able to help you much. It's used by a lot of people who do arpasing so not having the tool you need to make proper sounding phonetics with your samples is kinda a bad idea.
Most of the people I know don't actually use the assistant to begin with. The reason I consider compatibility to be a critical factor is because that decides whether a reclist can be counted as Arpasing or not. I make a few exceptions to this rule because diphone samples using Arpabet phonemes that weren't included in Arpasing are consistent with the design of the method. But a reclist using strictly single phoneme samples would no longer be an Arpasing voicebank, it would be a single phoneme English voicebank using Arpabet encoding.
Strangly enough, Moresampler's OTO generator already generates single phoneme entries for vowels and for the consonant "n". I have to wonder why we need 300 "n" samples.
Memory-wise it's incredibly inefficient and the bank would be needlessly big because you have 1000+ samples.
This is incorrect. If you were to record individual phonemes for an English voicebank., you would only have 35 or so. The reason that voicebanks like VCV Japanese have a thousand OTO entries is because, by combining multiple phonemes into a single sample, you are multiplying the number against itself several times. Let's design a reclist for a hypothetical language that has these phonemes: a i k s
A single phoneme reclist would look like this.
A CVVC reclist would look like this.
Code:
a i
ka ak ki ik
sa as si is
And a VCV reclist would look like this.
Code:
a i aa ai ia ii
ka ki aka aki ika iki ak ik
sa si asa asi isa isi as is
The more phonemes we need to combine into a single sample, the more samples we need to cover all the possible combinations. OP is on the right track with recording single phonemes to reduce the size of the reclist.
Otoing would be at another level of hell.
Given that there are only a handful of samples, it'd would be fairly simple to OTO by hand. I suppose that running such a voicebank through Moresampler would actually result in entries like [- k] and [k -] which would only double the number of OTO entries to an average CV Japanese voicebank.
If you're someone who struggles with VCV/CVVC otoing as it is, then otoing single syllables like this for arpasing would sound like bird chirping from how short the samples are. Either that or your bank will sound like it's speaking binary.
We're starting to get to the actual reason that the single phoneme method would not work.
There are no transitions between the phonemes. By recording everything entirely separate, you must rely solely on the crossfade between envelopes to connect the sounds. The reason that CV often sounds "choppy" compared to VCV is because there is nothing connecting the end of one CV to the beginning of the next CV. The method described by OP takes this to a new extreme. I'm now interested in trying it out just to see if I could manage to get a decent result with it, but it's likely to be much more trouble than it's worth.