HARK Document Version 2.5.0. (Revision: 9008) : SourceTrackerPF

6.2.20 SourceTrackerPF

6.2.20.1 Outline of the node

A compatible node with SourceTracker . SourceTracker uses an algorithm that gives the same ID to source localization results when sound sources are sufficiently close. However, this node gives ID to source localization results per particle group by performing clustering with particle filter based on the angle of Direction of Arrival. While this nodes enables more robust sound source tracking compared to SourceTracker , its computational cost is larger than that of SourceTracker . Note that the estimation result will be varied due to nonlinear modeling using a random walk model for the particle state transition model. Estimated sound source location is updated every frame.

6.2.20.2 Necessary file

No files are required.

6.2.20.3 Usage

When to use

Since the directions of arrival of a sound source obtained by the sound source localization node such as LocalizeMUSIC are discrete values and fluctuate, tracking the sound source localization results is required. SourceTrackerPF gives a sound source ID to each particle group for the sound source whose power is higher than the threshold value by performing clustering using information on the Direction of Arrival with particle filters. Recommended to use this node when sound source tracking does not work with SourceTracker . While this node enables more robust sound source tracking compared to SourceTracker , its computational cost increases larger than that of SourceTracker . Making the number of particle, one of the parameters, smaller reduces the processing time for computation. The parameter should be tuned taking into consideration the trade-off with the accuracy of estimation.

Typical connection

Normally, have the output of sound source localization node such as ConstantLocalization or LocalizeMUSIC connected to the input of this node. The appropriate sound source ID will be given to the localization results so then have it connected to GHDSS , sound source separation node, based on the localization results, or to localization result display node, DisplayLocalization .

Figure 6.49 shows a connection example.

$\includegraphics[width=\linewidth ]{fig/modules/SourceTrackerPF}$

Figure 6.49: Connection example of SourceTrackerPF

6.2.20.4 Input-output and property of the node

Input

INPUT: : Vector<ObjectRef> type. Sound source localization results with no sound source ID.

Output

OUTPUT: : Vector<ObjectRef> type. Sound source localization results with sound source ID.

Parameter

Table 6.35: Parameter list of SourceTrackerPF

Parameter name	Type	Default value	Unit	Description
THRESH_SOURCE_POWER	`float`			The threshold to determine whether or not the localization result should be ignored as noise.
TOTAL_PARTICLE	`int`	1000		The number of scattered particles.
SOURCE_MAX	`int`	2		The maximum number of sound sources.
ANOTHER_SOURCE	`float`	0.00001		The likelihood threshold of a particle group considered as the same sound source.
IGNORE_SOURCE	`float`	0.00001		The likelihood threshold to generate a new sound source.
REMOVE_SOURCE	`int`	150	[frame]	The number of frames left of particle groups not associated with the observation value.
OUTPUT_RANGE	`float`	3.0	[deg]	The angle range to recognize particles as the ones in the group’s neighborhood.
LIKELIHOOD_SIGMA	`float`	1.0	[deg]	The variance of probability distribution assumed to obtain the likelihood of the observation value.
STATE_UPDATE_SIGMA	`float`	1.0	[deg]	The variance of a Random Walk.
SUM_W	`float`	0.4		The parameter to avoid divergence.
HISTORY_LOG	`bool`	false		Enable or disable to show the history log.

THRESH_SOURCE_POWER: : float type. The threshold value to determine whether or not the localization result should be ignored as noise. When the MUSIC power is smaller than this value, the result is determined to be noise so as not outputted. Values which are too small will cause a lot of noises in the output whereas values which are too large will make difficult to localize the target sound source. It is vital to tune the parameter to find the value that satisfies this trade-off.
TOTAL_PARTICLE: : int type. The total number of particles to compute. While increasing this value causes high computational cost, placing the too small value worsens the accuracy of estimation on the probability distribution.
SOURCE_MAX: : int type. The maximum number of sound sources. Increasing this value will not affect the total number of particles. It will split particles and use them.
ANOTHER_SOURCE: : float type. The likelihood threshold of a particle group considered as the same sound source. When the maximum likelihood within the particle group falls below this value, it ends associating with the observation value.
IGNORE_SOURCE: : float type. The likelihood threshold to generate a new sound source. Prepare the new sound source generation when the likelihood for the all particle groups is under this value.
REMOVE_SOURCE: : int type. The number of frames left of particle groups not associated with the observation value.
OUTPUT_RANGE: : float type. The angle range from the center of gravity of the particle group to recognize particles as the ones in the group’s neighborhood. Particles within this range will be computed as being included in the particle group.
LIKELIHOOD_SIGMA: : float type. The parameter for likelihood calculation. The variance of probability distribution assumed to obtain the likelihood of the observation value. Specifying the variance too large will result high likelihood irrespective of the observation value. Specifying the variance too small will lower likelihood and diverge.
STATE_UPDATE_SIGMA: : float type. The variance of a Random Walk during the state transition. The random value is based on the normal distribution.
SUM_W: : float type. The parameter to avoid divergence. The threshold value as to whether or not to weight each particle. Output the average value of the particles in the group when the sum of the weighting values is smaller than the threshold since the calculation result will diverge when the importance is too small.
HISTORY_LOG: : bool type. Setting the value to trueshows the history log of the particle group. The default value is false.

6.2.20.5 Details of the node

First, this node determine whether or not the sound source localization result with no sound source ID that is given as the input is less than the THRESH_SOURCE_POWER parameter value. The results whose MUSIC power is less than the threshold will be considered as noise and discarded.

For the results whose MUSIC power is equal to or greater than the THRESH_SOURCE_POWER parameter value, the node will give sound source ID and estimate the direction of the sound source by the following sound source tracking method based on the particle filter.

A sound source tracking method using particle filters

In particle filters, define both the transition model $p(\boldsymbol {x}(t)\, |\, \boldsymbol {x}(t-1))$ and the observation model $p(y(t)\, |\, \boldsymbol {x}(t))$ as stochastic expression where the internal state is $\boldsymbol {x}(t)$ . Note that $y(t)$ indicates an observation vector. The $i$ th particle is holding the importance $\boldsymbol {w}_ i(t)$ that indicates how much both the internal state $\boldsymbol {x}_ i(t)$ and the particles would contribute to sound source tracking results. The importance is generally defined as likelihood.

The processing of this node consists of five steps; Initialization, Creation and Annihilation of a Sound Source, Selection, and Output.

Step 1 - Initialization

In the initialization, distribute all particles uniformly and randomly. In addition, adopt a particle group and define the importance $\boldsymbol {w}_ i$ as follows so that multiple sound sources can be handled.

	$\displaystyle \sum _{i\in P_ k} w_ i$	$\displaystyle =$	$\displaystyle 1 \label{eqn:iniWi}$		(19)
	$\displaystyle \sum _{k=1}^ S N_ k$	$\displaystyle =$	$\displaystyle N \label{eqn:iniSNK}$		(20)

Here, $N_ k$ is the number of particles that $P_ k$ , the $k$ th particle group, has, $S$ is the number of sound sources, and $N$ is the total number of particles.

Step 2 - Creation and Annihilation of a Sound Source

This step is for dealing with multiple sound sources. The internal state of the particle group Pk is defined as follows.

$\displaystyle \hat{\boldsymbol {x}}_ k(t)$

$\displaystyle =$

$\displaystyle \sum _{i\in P_ k} \boldsymbol {x}_ i \cdot w_ i(t) \label{eqn:check}$

(21)

When the $j$ th observation at the time $t$ is $\boldsymbol {y}_ j$ , the angle between $\boldsymbol {y}_ j$ and $\hat{\boldsymbol {x}}_ k(t)$ is $\angle \theta$ ， and the threshold for the angle obtained by ANOTHER_SOURCE parameter is $\angle \theta _{th}$ , do the following processing.

Let $\boldsymbol {y}_ j$ associate with $P_ k$ if $\angle \theta < \angle \theta _{th}$ is true.
Create a new particle group if no particle group associated with $\boldsymbol {y}_ j$ is found.
Annihilate $P_ k$ if the observation associated with the particle group $P_ k$ cannot be obtained within the time period specified in the REMOVE_SOURCE parameter.
In any case, redistribute the particles to meet the equations (19, 20).

Step 3 - Importance sampling

The flow of the importance sampling are as follows.

Estimate the state $\boldsymbol {x}_ i(t)$ from $\boldsymbol {x}_ i(t-1)$ using the transition model, $p(\boldsymbol {x}(t)|\boldsymbol {x}(t-1))$ .
Update the importance $w_ i(t)$ using the equation (25).
Normalize $w_ i(t)$ according to the equations (19，20).

The transition model of the azimuth angle $\theta _ i(t)$ and the elevation angle $\phi _ i(t)$ of the sound source direction, the elements of $\boldsymbol {x}_ i(t)$ , are defined based on the Random Walk Process as follows.

	$\displaystyle \theta _ i(t)$	$\displaystyle =$	$\displaystyle \theta _ i(t-1) + r_\theta$		(22)
	$\displaystyle \phi _ i(t)$	$\displaystyle =$	$\displaystyle \phi _ i(t-1) + r_\phi \label{eqn:randum}$		(23)

$r_*$ is a random number based on normal distribution. Specify the variance in the STATE_UPDATE_SIGMA parameter.

When the angle between $\boldsymbol {x}_ i(t)$ and $\boldsymbol {y}_ j$ is $\angle \psi$ , the likelihood can be defined below.

$\displaystyle l(t)$

$\displaystyle =$

$\displaystyle exp\left( - \frac{\angle \psi ^2}{2R} \right) \label{eqn:likelihood}$

(24)

Update $w_ i$ with the following equation at the end.

$\displaystyle w_ i(t)$

$\displaystyle =$

$\displaystyle l(t) \cdot w_ i(t-1) \label{eqn:wupdate}$

(25)

Step 4 - Selection

Update particles according to the importance $w_ i$ .

The number of particles for the $i$ that satisfies $i \in P_ k$ is updated by the following equation.

$\displaystyle N_(k_ i)$

$\displaystyle =$

$\displaystyle round(N_ k \cdot w_ i) \label{eqn:selction1}$

(26)

In the below case, $R_ k$ particles are left not updated.

$\displaystyle R_ k$

$\displaystyle =$

$\displaystyle N_ k - \sum _{i\in P_ k} N_(k_ i) \label{eqn:selction2}$

(27)

These particles above are also distributed according to the residual weight parameter, $R_(w_ i)$ .

$\displaystyle R_(w_ i)$

$\displaystyle =$

$\displaystyle w_ i - N_(k_ i) \Big/ \sum _{j\in P_ k} N_(k_ j) \label{eqn:selction3}$

(28)

Sampling Importance Resampling (SIR) is used.

Step 5 - Output

Estimate the posterior probability $p(\boldsymbol {x}(t) \ | \ \boldsymbol {x}(t))$ from the density of updated particles.

The internal state of the particle group for the sound source $k$ is estimated by the equation (21).

Repeat Step2 to Step5 until tracking the sound source is completed.

Output the $\hat{\boldsymbol {x}}_ k(t)$ of each particle group as the estimation result of the sound source position.

Assign a sound source ID per particle group so the sound source ID will be taken over within the same particle group.

6.2.20.6 References

(1) K. Nakadai, K. Nakajima, M. Murase, H. Okuno, Y. Hasegawa and H. Tsujino: “Tracking of Multiple Sound Source by Integration of Robot-Embedded and In-Room Microphone Arrays”, Journal of the Robotics Society of Japan, Vol.25, no.6 (2007).