2.2 Learning sound localization

Problem

I want to perform source localization in HARK but don’t know what to start with.

Solution

(1) Source localization of an audio file

$\includegraphics[width=.5\linewidth ]{fig/recipes/LearningHARK_002_01_1}$

(2.3.a) MAIN Subnetwork

$\includegraphics[width=.8\linewidth ]{fig/recipes/LearningHARK_002_01_2}$

(2.3.b) Iterator Subnetwork

Figure 2.4: HARK network file for sound source localization using a .wav file

Fig. 2.4 shows an example of a HARK network file for sound source localization using a .wav file input. The .wav file contains multi-channel signals recorded by a microphone array. In the network file, it localizes sound sources and displays their locations.

For the node property settings in the network file, see Section 6.2 in the HARK document.

We provide an example of a HARK network file, called “recog.n”, which includes sound source localization at HARK Automatic Speech Recognition Pack.

For the first simple test, download and unzip the HARK Automatic Speech Recognition Pack. Go to the unzipped directory and type the following command.

./recog.n MultiSpeech.wav loc_tf.dat sep_tf.dat

You will then see a localization result, as in Fig. 2.5. If you see the window and the localization result, the network is working correctly.

$\includegraphics[width=90mm]{fig/recipes/LearningHARK_002_02_1.eps}$

Figure 2.5: Snapshot of the sound source localization result using recog.n

(2) Real time sound source localization from a microphone

Fig. 2.6 shows an example of a HARK network file for real-time sound source localization using a microphone array.

$\includegraphics[width=.3\linewidth ]{fig/recipes/LearningHARK_002_03_1}$

(2.5.a) MAIN Subnetwork

$\includegraphics[width=.8\linewidth ]{fig/recipes/LearningHARK_002_03_2}$

(2.5.b) Iterator Subnetwork

Figure 2.6: HARK network file for sound source localization using a microphone array

Here, AudioStreamFromWave in Fig. 2.4 is replaced by AudioStreamFromMic . By properly setting the parameters in AudioStreamFromMic , a sound source can be localized in real time using a microphone array. For the setting of these parameter, see Section 6.2 in the HARK document. If the network file works properly, you will see the localization result as inFig. 2.5. If it does not work properly, read , “Sound recording fails” or “Sound source localization fails”

(3) Sound source localization with suppression of constant noise

The sound source localization shown in Fig. 2.4 and Fig. 2.6 can not determine which sound sources are desired. If there are several of high power noise in your environment, LocalizeMUSIC will only localize noise. In the worst case, it cannot localize speech, resulting in a drastic degradation of performance of automatic speech recognition.

This is especially true for automatic speech recognition by a robot-embedded microphone array, in which there are several sources of high power noise related to the robot motor and fan, degrading the performance of the entire system.

To solve this problem, HARK supports the pre-measured noise suppression function in sound source localization. To enable this function, two steps are needed:

Generation of pre-measured noise files for localization
Sound source localization with these noise files

The next two section explain (3-1) and (3-2), respectively.

(3-1) Generation of pre-measured noise files for localization

$\includegraphics[width=.6\linewidth ]{fig/recipes/LearningHARK_002_04_1}$

(2.6.a) MAIN Subnetwork

$\includegraphics[width=.8\linewidth ]{fig/recipes/LearningHARK_002_04_2}$

(2.6.b) Iterator Subnetwork

Figure 2.7: HARK network file for generating the noise files for sound source localization

Fig. 2.7 shows an example of a HARK network file for generating a pre-measured noise file for sound source localization. To set the parameter of the HARK nodes, see Section 6.2 in the HARK document. The Iterator (LOOP0) subnetwork in Fig. 2.7 has 3 Constant nodes, an IterCount node, a Smaller node, and an Equal node. The parameter settings for those nodes are:

node_Constant_1
- VALUE
  int type. VALUE = 200.
  This represents the frame length used to generate the noise file from the first frame.
node_Constant_2
- VALUE
  string type. VALUE = NOISEr.dat.
  File name for the real part of the noise file.
node_Constant_3
- VALUE
  string type. VALUE = NOISEi.dat.
  File name for the imaginary part of the noise file.
node_IterCount_1
- No parameter
  This outputs the index of the HARK processing frames
node_Smaller_1
- No parameter
  This determines the index of HARK processing frames is larger than a specific number.
node_Equal_1
- No parameter
  This determines if the index of HARK processing frames is equal to a specific number.

Here, we set the node_Constant_1 VALUE at 200. We therefore set MAX_SUM_COUNT in CMMakerFromFFTwithFlag as greater than 200.

This network file utilizes a .wav file input containing only noise. Depending on the VALUE of node_Constant_1, this node generates noise file for certain frames.

When you run the network file, two files, named NOISEr.dat and NOISEi.dat, will appear in your current directory. These two files are used for sound source localization with noise-suppression function.

In this example, we used 200 frames from the first frame to generate the noise file. By using conditions other than those of the Smaller node, you can specify which frame you will use for the generation.

(3-2) Sound source localization with the noise files

$\includegraphics[width=.5\linewidth ]{fig/recipes/LearningHARK_002_05_1}$

(2.7.a) MAIN Subnetwork

$\includegraphics[width=.9\linewidth ]{fig/recipes/LearningHARK_002_05_2}$

(2.7.b) Iterator Subnetwork

Figure 2.8: HARK network file for sound source localization with pre-measured noise suppression

Fig. 2.8 shows an example of a HARK network file for sound source localization using noise files created in (3-1), NOISEr.dat and NOISEi.dat. For the parameter settings of the HARK nodes, see Section 6.2 in the HARK document. The Iterator (LOOP0) subnetwork in Fig. 2.8 has 3 Constant nodes, and the parameter setting for those nodes are:

node_Constant_1
- VALUE
  string type. VALUE = NOISEr.dat.
  File name for the real part of the loaded noise file.
node_Constant_2
- VALUE
  string type. VALUE = NOISEi.dat.
  File name for the imaginary part of the loaded noise file.
node_Constant_3
- VALUE
  int type. VALUE = 0.
  This enables updating noise information every frame. If 0, the noise files are loaded only at the first frame.

CMLoad reads the noise files, NOISEr.dat and NOISEi.dat, and whitens the noise in sound source localization. To enable the noise suppression function, set MUSIC_ALGORITHM in LocalizeMUSIC to GEVD or GSVD. The details of the algorithm for the noise suppression are described in Section 6.2 of the HARK document.

When you run the HARK network file, you will see sound source localization results similar to those in Fig. 2.5. Compared with localization without noise suppression, you will see a greater focus on speech localization.

Discussion

For all the details about the algorithm and noise suppression in LocalizeMUSIC , see Section 6.2 in the HARK document. To increase accuracy, read the recipe in Chapter 8 or the descriptions of the nodes LocalizeMUSIC and SourceTracker in the HARK document and tune it.