The green line is the overlap, or where UTAU will crossfade between samples. You shouldn't have it too far in for hard consonants like k, g, t and d, as it might make them sound weird. The red line is the preutterance, marking where the main body of the sample starts. This is used to 'place' the sample on a note. In CV, it should cover the entire consonant, though in my experience you don't have to be incredibly precise with it for it to sound fine. Just don't get any of the vowel in there.
Oto'ing is generally easier to do with a spectrogram than just looking at the waveform, as it makes it easier to see where phonemes begin and end. You can easily tell the difference between the consonant and vowel, as the vowel will show as a repeating pattern.