HARK Cookbook: Evaluating the speech recognition

14.6.2 Evaluating the speech recognition

The next step is to evaluate the success rate of speech recognition using an evaluation script score.py. You need to run the evaluation script for each sound directions, 60, 0, and -60 degrees. Thus, run these three commands

    python score.py result.txt transcription_list1.txt 60 10
    python score.py result.txt transcription_list3.txt 0 10
    python score.py result.txt transcription_list2.txt -60 10

Each argument means that a speech recognition log, a reference data, a sound direction, and a tolerance. For example, the first line means that the recognized words from 50 to 70 degrees are evaluated by comparing the “transcription_list1.txt” to the log “result.txt”.

After you run the script, you will see the result like Fig. 14.35. Starting from the left, each row means that the recognition is succeed or not, recognition result, and the reference. The last line means the overall success rate. In this case, 17 utterances out of 20 utterances are successfully recognized, consequently, the success rate us 85%.

Result	Recognition	Correct
Success	"A set of fried chicken"	"A set of fried chicken"
Success	"A set of fried pork"	"A set of fried pork"
Success	"A set of baked fish"	"A set of baked fish"
Success	"Steak"	"Steak"
Fail	"A set of fried chicken"	"Matsusaka beef steak"
	(skipped)
Success	"Coffee"	"Coffee"
17 / 20 (85.0 %)

Figure 14.35: Recognition result. Note that these words are translated.

For any directions, the success rates should be from 70% to 90%. If the rate is extremely low, check if you specified the correct pair of a direction and a reference data. If the rate is still low, the separation or recognition may fail. Listen to the files in wav/ to check if the separation is succeeded, or refer to the recipes in Chapter 3.