I've done a lot of experimenting with multipitch voicebanks, and from what I've found, most leaps over 6 halfsteps do not transition well. For instance, the best result I've gotten out of a three pitch voicebank was with six halfsteps in one scale [ex. G3 + C#4 + G4]. Better still is the four pitch with four halfsteps [G3 + B3 + D#4 + G4], and the best one that I've found without redundancy [that is, having near-identical pitches] has been the five pitch with two halfsteps [G3 + A#3 + C#4 + E4 + G4]. Of course, this is based on my own voice, but through analyzing other people's voices and voicebanks, the data seems pretty consistent.
Since English is a very large language, but you would like a two-octave scale, I would suggest going with something like D3 + G#3 + D4 + G#4 + D5 [six halfstep transitions], or, sticking with your selected pitches, D3 + G3 + C4 + F4 + B4 [five/six halfstep transitions], so that you get smoother transitions between notes, but the size is still manageable.
The maximum pitches of an English bank is ~six before UTAU will have problems loading the OTO.
Hope this helps!