HARK Document Version 3.4.0. (Revision: 9509) : SpeechRecognitionSMNClient

6.6.2 SpeechRecognitionSMNClient

6.6.2.1 Outline of the node

This node sends acoustic features to the speech recognition node via a network connection. The main difference from SpeechRecognitionClient is that this node performs mean subtraction (Spectral Mean Normalization: SMN) of input feature vectors. However, this node uses a method for removing the average of the entire utterance section. Therefore, even when used online, sending is not performed until the utterance is over, and for that reason the processing is not real-time. In order to realize real-time processing, it is necessary to estimate or approximate the mean values of the utterance concerned using some values without obtaining the features values of the entire utterance section. For the details of the approximation processing, see Details of the node.

6.6.2.2 Necessary file

No files are required.

6.6.2.3 Usage

When to use

This node is used to send acoustic features to software outside of HARK. For example, it sends them to the large vocabulary continuous speech recognition software Julius $^{(1)}$ to perform speech recognition.

Typical connection

$\includegraphics[width=100mm]{fig/modules/SpeechRecognitionSMNClient}$

Figure 6.103: Connection example of SpeechRecognitionSMNClient

6.6.2.4 Input-output and property of the node

Table 6.89: Parameter list of SpeechRecognitionSMNClient

Parameter name	Type	Default value	Unit	Description
MFM_ENABLED	`bool`	`true`		Select whether or not to send out missing feature masks
HOST	`string`	127.0.0.1		Host name /IP address of the server on which Julius/Julian is running
PORT	`int`	5530		Port number for sending out to network
SOCKET_ENABLED	`bool`	`true`		The flag that determines whether or not to output to the socket

Input

FEATURES: : Map<int, ObjectRef> type. A pair of the sound source ID and feature vector as Vector<float> type data.
MASKS: : Map<int, ObjectRef> type. A pair of the sound source ID and mask vector as Vector<float> type data.
SOURCES: : Vector<ObjectRef> type.

Output

OUTPUT: : Vector<ObjectRef> type.

Parameter

MFM_ENABLED: : bool type. When true is selected, MASKS is transmitted. When false is selected, MASKS input is ignored, a mask of all 1’s is transmitted.
HOST: : string type. The IP address of a host that transmits acoustic parameters. When SOCKET_ENABLED is set to false, it is not used.
PORT: : int type. The socket number to transfer acoustic parameters.When SOCKET_ENABLED is set to false, it is not used.
SOCKET_ENABLED: : bool type. When true, acoustic parameters are transmitted to the socket and when false, they are not transmitted.

When MFM_ENABLED is set to true and SOCKET_ENABLED, this node sends acoustic features and mask vectors to the speech recognition node via the network port. When falseis selected for MFM_ENABLED, speech recognition not based on missing feature theory is performed. In actual operations, mask vectors are sent out with all mask vectors as 1, all acoustic features as reliable in other words. When false is selected for SOCKET_ENABLED, the features are not sent to the speech recognition node. This node is used to perform network operation checks of HARK without running the external program since the speech recognition engine depends on an external program. For HOST, designate an IP address of HOST for which the external program that sends vectors is running. For PORT, designate a network port number to send the vector.

6.6.2.5 References:

(1) http://julius.sourceforge.jp/en_index.php