Make sure that you're in Mode 2. This is more important for doing any pitch manipulations, which is more relevant to singing vocals, but it will also enable the envelope view at the same time.
To ensure that samples blend together smoothly, you can press an envelope crossfade button (either P1P4 or P2P3). Each note will then be faded out at the ends where they connect.
As for transcribing the lyrics, you may want to become more familiar with the phonology of Japanese and English. Here are the Wikipedia articles you can read, so that you can find the similarities between the sounds of the languages.
en.wikipedia.org
en.wikipedia.org
Remember, English spelling doesn't correspond 100% to its pronunciation. Think more about how a word sounds than what letters are used to spell it.