HARK Cookbook: MSLS+MSLS+Power+Preprocessing

14.5.7 MSLS+ $\Delta$ MSLS+ $\Delta$ Power+Preprocessing

An execution example is shown in Figure 14.30. After the execution, a file named MFBANK27_0.spec is generated. This file stores little endian 27 dimensional vector sequence expressed in the 32 bit floating-point number format. When feature axtraction cannot be performed well, check if the f101b001.wav is in the data directory.

Figure 14.30: Executione example of demo6.sh.

$>$ ./demo.sh 6
UINodeRepository Scan()
Scanning def /usr/lib/flowdesigner/toolbox
done loading def files
loading XML document from memory
done!
Building network MAIN

Seventeen modules are included in this sample. There are three modules in MAIN_LOOP (iterator) and fourteen modules in MAIN (subnet). MAIN (subnet) MAIN (subnet) and MAIN_LOOP (iterator) are shown in Figures 14.31 and 14.32 As an outline of the processing, it is simple network configuration in which acoustic features are calculated in MSLSExtraction with the audio waveforms collected in the AudioStreamFromWave module and are written in SaveFeatures . Since pre-emphasis is performed for over the time domain, after analyzing audio waveforms in MultiFFT , their type is converted with MatrixToMap and the signals are synthesized by Synthesize once. Pre-emphasis is performed for the synthesized waves with PreEmphasis , they are analyzed with MultiFFT once more, their type is converted with PowerCalcForMap and sent to MSLSExtraction . Since MSLSExtraction requires the outputs of the mel-scale filter bank and power spectra for calculation of MSLS, the collected audio waveforms are analyzed by MultiFFT and their data type are converted by MatrixToMap and PowerCalcForMap , and then processing to obtain outputs of the mel-scale filter bank is performed by MelFilterBank . MSLSExtraction reserves a storing region for the $\delta$ MSLS coefficient other than the MSLS coefficient and outputs vectors as a feature (zero is in the storing region for the $\delta$ MSLS coefficient). Since the USE_POWER property is set to true, a storing region of $\delta$ MSLS and the delta power term is secured for the $\delta$ coefficient. herefore, vectors that are double of the values specified in the FBANK_COUNT property of MSLSExtraction +1 are output as a feature. Pssing through SpectralMeanNormalization , which performs mean subtraction, the $\delta$ MSLS coefficient and delta power term are calculated and stored with Delta . Since necessary coefficients are the MSLS coefficient and $\delta$ MSLS coefficient and delta power term, it is necessary to delete unnecessary power terms. Use FeatureRemover to delete them. SaveFeatures saves the input FEATURE. The localization result from the front generated by ConstantLocalization is gave to SOURCES.

$\includegraphics{fig/recipes/demo-FeatureExtraction6-MAIN.png}$

Figure 14.31: MAIN (subnet)

$\includegraphics{fig/recipes/demo-FeatureExtraction6-MAIN_LOOP.png}$

Figure 14.32: MAIN_LOOP (iterator)

Table 14.18 summarizes the parameters of the network. Its main modules are PreEmphasis , MSLSExtraction , SpectralMeanNormalization , Delta , and FeatureRemover . see HARK document for details.

Table 14.18: Parameter list

Node name	Parameter name	Type	Value
Constant	VALUE	`subnet_param`	1
MAIN_LOOP	LENGTH	`subnet_param`	2
	ADVANCE	`subnet_param`	3
	SAMPLING_RATE	`subnet_param`	4
	FBANK_COUNT	`subnet_param`	5
	FBANK_COUNT1	`subnet_param`	6
	DOWHILE	`bool`	(empty)
PreEmphasis	LENGTH	`subnet_param`	LENGTH
	SAMPLING_RATE	`subnet_param`	SAMPLING_RATE
	PREEMCOEFF	`float`	0.97
	INPUT_TYPE	`string`	WAV
MSLSExtraction	FBANK_COUNT	`subnet_param`	FBANK_COUNT
	NORMALIZE_MODE	`string`	Cepstral
	USE_POWER	`bool`	`true`
SpectralMeanNormalization	FBANK_COUNT1	`subnet_param`	FBANK_COUNT1
Delta	FBANK_COUNT1	`subnet_param`	FBANK_COUNT1
FeatureRemover	SELECTOR	`Object`	`<Vector<float> 13>`

14.5.7 MSLS+MSLS+Power+Preprocessing

14.5.7 MSLS+ $\Delta$ MSLS+ $\Delta$ Power+Preprocessing