6.6.2 SpeechRecognitionSMNClient

6.6.2.1 Outline of the node

This node sends acoustic features to the speech recognition node via a network connection. The main difference from SpeechRecognitionClient  is that this node performs mean subtraction (Spectral Mean Normalization: SMN) of input feature vectors. However, this node uses a method for removing the average of the entire utterance section. Therefore, even when used online, sending is not performed until the utterance is over, and for that reason the processing is not real-time. In order to realize real-time processing, it is necessary to estimate or approximate the mean values of the utterance concerned using some values without obtaining the features values of the entire utterance section. For the details of the approximation processing, see Details of the node.

6.6.2.2 Necessary file

No files are required.

6.6.2.3 Usage

When to use

This node is used to send acoustic features to software outside of HARK. For example, it sends them to the large vocabulary continuous speech recognition software Julius $^{(1)}$ to perform speech recognition.

Typical connection

\includegraphics[width=100mm]{fig/modules/SpeechRecognitionSMNClient}
Figure 6.103: Connection example of SpeechRecognitionSMNClient 

6.6.2.4 Input-output and property of the node

Table 6.89: Parameter list of SpeechRecognitionSMNClient 

Parameter name

Type

Default value

Unit

Description

MFM_ENABLED

bool 

true

 

Select whether or not to send out missing feature masks

HOST

string 

127.0.0.1

 

Host name /IP address of the server on which Julius/Julian is running

PORT

int 

5530

 

Port number for sending out to network

SOCKET_ENABLED

bool 

true

 

The flag that determines whether or not to output to the socket

Input

FEATURES

: Map<int, ObjectRef>  type. A pair of the sound source ID and feature vector as Vector<float>  type data.

MASKS

: Map<int, ObjectRef>  type. A pair of the sound source ID and mask vector as Vector<float>  type data.

SOURCES

: Vector<ObjectRef>  type.

Output

OUTPUT

: Vector<ObjectRef>  type.

Parameter

MFM_ENABLED

: bool  type. When true is selected, MASKS is transmitted. When false is selected, MASKS input is ignored, a mask of all 1’s is transmitted.

HOST

: string  type. The IP address of a host that transmits acoustic parameters. When SOCKET_ENABLED is set to false, it is not used.

PORT

: int  type. The socket number to transfer acoustic parameters.When SOCKET_ENABLED is set to false, it is not used.

SOCKET_ENABLED

: bool  type. When true, acoustic parameters are transmitted to the socket and when false, they are not transmitted.

When MFM_ENABLED is set to true and SOCKET_ENABLED, this node sends acoustic features and mask vectors to the speech recognition node via the network port. When falseis selected for MFM_ENABLED, speech recognition not based on missing feature theory is performed. In actual operations, mask vectors are sent out with all mask vectors as 1, all acoustic features as reliable in other words. When false is selected for SOCKET_ENABLED, the features are not sent to the speech recognition node. This node is used to perform network operation checks of HARK without running the external program since the speech recognition engine depends on an external program. For HOST, designate an IP address of HOST for which the external program that sends vectors is running. For PORT, designate a network port number to send the vector.

6.6.2.5 References:

(1) http://julius.sourceforge.jp/en_index.php