2.2 Learning sound localization

Problem

I want to perform source localization in HARK but don’t know what to start with.

Solution

(1) Source localization of an audio file

\includegraphics[width=.5\linewidth ]{fig/recipes/LearningHARK_002_01_1}
(2.3.a) MAIN Subnetwork

\includegraphics[width=.8\linewidth ]{fig/recipes/LearningHARK_002_01_2}
(2.3.b) Iterator Subnetwork
Figure 2.4: HARK network file for sound source localization using a .wav file

Fig. 2.4 shows an example of a HARK  network file for sound source localization using a .wav file input. The .wav file contains multi-channel signals recorded by a microphone array. In the network file, it localizes sound sources and displays their locations.

For the node property settings in the network file, see Section 6.2 in the HARK  document.

We provide an example of a HARK  network file, called “recog.n”, which includes sound source localization at HARK  Automatic Speech Recognition Pack.

For the first simple test, download and unzip the HARK  Automatic Speech Recognition Pack. Go to the unzipped directory and type the following command.

./recog.n MultiSpeech.wav loc_tf.dat sep_tf.dat

You will then see a localization result, as in Fig. 2.5. If you see the window and the localization result, the network is working correctly.

\includegraphics[width=90mm]{fig/recipes/LearningHARK_002_02_1.eps}
Figure 2.5: Snapshot of the sound source localization result using recog.n

(2) Real time sound source localization from a microphone

Fig. 2.6 shows an example of a HARK  network file for real-time sound source localization using a microphone array.

\includegraphics[width=.3\linewidth ]{fig/recipes/LearningHARK_002_03_1}
(2.5.a) MAIN Subnetwork

\includegraphics[width=.8\linewidth ]{fig/recipes/LearningHARK_002_03_2}
(2.5.b) Iterator Subnetwork
Figure 2.6: HARK network file for sound source localization using a microphone array

Here, AudioStreamFromWave in Fig. 2.4 is replaced by AudioStreamFromMic . By properly setting the parameters in AudioStreamFromMic , a sound source can be localized in real time using a microphone array. For the setting of these parameter, see Section 6.2 in the HARK  document. If the network file works properly, you will see the localization result as inFig. 2.5. If it does not work properly, read , “Sound recording fails” or “Sound source localization fails

(3) Sound source localization with suppression of constant noise

The sound source localization shown in Fig. 2.4 and Fig. 2.6 can not determine which sound sources are desired. If there are several of high power noise in your environment, LocalizeMUSIC will only localize noise. In the worst case, it cannot localize speech, resulting in a drastic degradation of performance of automatic speech recognition.

This is especially true for automatic speech recognition by a robot-embedded microphone array, in which there are several sources of high power noise related to the robot motor and fan, degrading the performance of the entire system.

To solve this problem, HARK  supports the pre-measured noise suppression function in sound source localization. To enable this function, two steps are needed:

The next two section explain (3-1) and (3-2), respectively.

(3-1) Generation of pre-measured noise files for localization

\includegraphics[width=.6\linewidth ]{fig/recipes/LearningHARK_002_04_1}
(2.6.a) MAIN Subnetwork

\includegraphics[width=.8\linewidth ]{fig/recipes/LearningHARK_002_04_2}
(2.6.b) Iterator Subnetwork
Figure 2.7: HARK network file for generating the noise files for sound source localization

Fig. 2.7 shows an example of a HARK  network file for generating a pre-measured noise file for sound source localization. To set the parameter of the HARK  nodes, see Section 6.2 in the HARK  document. The Iterator (LOOP0) subnetwork in Fig. 2.7 has 3 Constant nodes, an IterCount  node, a Smaller  node, and an Equal  node. The parameter settings for those nodes are:

Here, we set the node_Constant_1 VALUE at 200. We therefore set MAX_SUM_COUNT in CMMakerFromFFTwithFlag as greater than 200.

This network file utilizes a .wav file input containing only noise. Depending on the VALUE of node_Constant_1, this node generates noise file for certain frames.

When you run the network file, two files, named NOISEr.dat and NOISEi.dat, will appear in your current directory. These two files are used for sound source localization with noise-suppression function.

In this example, we used 200 frames from the first frame to generate the noise file. By using conditions other than those of the Smaller  node, you can specify which frame you will use for the generation.

(3-2) Sound source localization with the noise files

\includegraphics[width=.5\linewidth ]{fig/recipes/LearningHARK_002_05_1}
(2.7.a) MAIN Subnetwork

\includegraphics[width=.9\linewidth ]{fig/recipes/LearningHARK_002_05_2}
(2.7.b) Iterator Subnetwork
Figure 2.8: HARK network file for sound source localization with pre-measured noise suppression

Fig. 2.8 shows an example of a HARK  network file for sound source localization using noise files created in (3-1), NOISEr.dat and NOISEi.dat. For the parameter settings of the HARK  nodes, see Section 6.2 in the HARK  document. The Iterator (LOOP0) subnetwork in Fig. 2.8 has 3 Constant nodes, and the parameter setting for those nodes are:

CMLoad reads the noise files, NOISEr.dat and NOISEi.dat, and whitens the noise in sound source localization. To enable the noise suppression function, set MUSIC_ALGORITHM in LocalizeMUSIC to GEVD or GSVD. The details of the algorithm for the noise suppression are described in Section 6.2 of the HARK  document.

When you run the HARK  network file, you will see sound source localization results similar to those in Fig. 2.5. Compared with localization without noise suppression, you will see a greater focus on speech localization.

Discussion

For all the details about the algorithm and noise suppression in LocalizeMUSIC , see Section 6.2 in the HARK  document. To increase accuracy, read the recipe in Chapter 8 or the descriptions of the nodes LocalizeMUSIC and SourceTracker in the HARK document and tune it.

See Also

Sound recording fails, Sound source localization fails