After the automatic alignment and labeling (often called ``autolabeling'') is finished, the results are visually inspected by either random sample, or by checking every one, using display tools such as emulabel [4]. We also run a number of scripts that check the durations and other features of each segment, to find those that are quite obviously out automatically, so as to go in with emulabel and correct them by hand.
In collecting these diphone sets we have noted that the quality of autolabelling has improved as we typically use the previous set as the prompts. Of course this is the same voice delivered (mostly) in the same style thus later versions have not required full checking and sampling and targeting problems alone has been sufficient.