If you don't care about things like Kana support you can call it a day with just putting the samples in a folder depending on how the samples were recorded. In the vast majority of cases, however, voicebanks require setup files to work as intended, of which the oto.ini is the most important. This setup file contains settings that define how individual samples are processed. I'm pretty sure there is a tutorial on making otos published here. You might want to modify your regional settings before attempting to create the oto though as the synthesis engine might not parse the oto correctly if the system uses the incorrect decimal symbol (issue is well-documented in the case of original UTAU; not sure about OpenUtau). You might also want to include frequency maps if you plan to distribute your voicebank.
As for the character image, you have to specify it in another setup file called character.txt.