HARK version 1.1.0 Document : AudioStreamFromMic

6.1.1 AudioStreamFromMic

Outline of the node

This node takes in multichannel speech waveform data from a microphone array. The audio interface devices supported by this node are System In Frontier, Inc. (previously JEOL SYSTEM TECHNOLOGY CO., LTD.), the RASP series, Tokyo Electron Device TD-BD-16ADUSB and ALSA-based devices (e.g. The RME Hammerfall DSP Multiface series). For an introduction to various devices, see Section 8.

Necessary file

No files are required.

How to use

When to use

This node is used when wishing to use speech waveform data from a microphone array as input to the HARK system.

Typical connection

Figure 6.1 shows an example usage of the AudioStreamFromMic node.

$\includegraphics[width=.8\textwidth ]{fig/modules/AudioStreamFromMic}$

Figure 6.1: Connection example of AudioStreamFromMic

Overview of device

Among the devices that the AudioStreamFromMic node supports, the following are introduced with photos.

Radio RASP
RME Hammerfall DSP series Multiface (Device corresponding to ALSA).

1. Radio RASP

Figure 6.2 shows the appearance of the radio RASP. Connection with the HARK system is established through Ethernet with a wireless LAN. The power is supplied to the radio RASP with an attached AC adapter. Since the radio RASP responds to plug in power, a microphone of the plug in power supply can be connected to the terminal without any change. Sound recording can easily performed without a microphone preamplifier as an advantage.

$\includegraphics[width=.5\textwidth ]{fig/modules/AD/AD-WL-RASP}$

Figure 6.2: Radio RASP

2. RME Hammerfall DSP Multiface series

Figures 6.3 and 6.4 show the appearance of the RME Hammerfall DSP series Multiface. The device communicates with a host PC through a 32bit CardBus. Although a microphone can be connected to the device through a 6.3 mm TRS terminal, a microphone amplifier is used to ensure the input level (Figure 6.4).) For example, the user may connect a microphone to RME OctaMic II and connect OctaMic II and Multiface. OctaMic II supports a phantom power supply, and a condenser microphone that requires phantom power (e.g. DPA 4060-BM) can be connected directly. However, since it does not have a plug in power supplying function, a battery box for plug in power is required to connect plug in power supply type microphones. For example, such battery boxes are attached to Sony EMC-C115 and audio-technica AT9903.

$\includegraphics[width=.5\textwidth ]{fig/modules/AD/AD-RME}$

Figure 6.3: Front view of RME Hammerfall DSP Multiface

$\includegraphics[width=.5\textwidth ]{fig/modules/AD/AD-RME-back}$

Figure 6.4: Back view of RME Hammerfall DSP Multiface

Input-output and property of node

Table 6.2: Parameter of AudioStreamFromMic

Parameter name	Type	Default value	Unit	Description
LENGTH	`int`	512	[pt]	Frame length as a fundamental unit for processing.
ADVANCE	`int`	160	[pt]	Frame shift length.
CHANNEL_COUNT	`int`	8	[ch]	Microphone input channel number of a device to use.
SAMPLING_RATE	`int`	16000	[Hz]	Sampling frequency of audio waveform data loaded.
DEVICETYPE	`string`	WS		Type of device to be used.
GAIN	`string`	0dB		Gain value used with RASP device.
DEVICE	`string`	127.0.0.1		Character string necessary to access to device. Device name such as "plughw:0,1" or IP address when RASP is used.

Input Not required.

Output

AUDIO: Matrix<float> type. Indexed, multichannel audio waveform data with rows as channels and columns as samples. Size of the column is equal to the parameter LENGTH.
NOT_EOF: bool type. This indicates whether there is still input from the waveform to be processed. Used as an ending flag when processing the waveforms in a loop. When it is true, waveforms are loaded, and when it is false, reading is complete. true is output continuously.

Parameter

LENGTH: int type. The default value is 512. Designates the frame length, which is a base unit of processing, in terms of number of samples. The higher the value, the higher the frequency resolution, but the lower the temporal resolution. It is known that length corresponding to $20 \sim 40$ [ms] is appropriate for the analysis of audio waveforms. The default value of 32 [ms] corresponds to the sampling frequency 16,000 [Hz].
ADVANCE: int type. The default value is 160. Designates the frame shift length in samples. The default value of 10 [ms] corresponds to the sampling frequency 16,000 [Hz].
CHANNEL_COUNT: int type. The number of channels of the device to be used.
SAMPLING_RATE: int type. The default value is 16000. Designates the sampling frequency – how often to sample per second – of the loaded waveforms. When frequencies up to $\omega$ [Hz] are needed for processing, set the sampling frequency to over $2\omega$ [Hz]. When the sampling frequency is high, data generally increases and it makes it difficult to perform real-time processing.
DEVICETYPE: string type. Select from ALSA, SINICH, RASP and WS. When a device supporting ALSA-based drivers is used, select ALSA. When TD-BD-16ADUSB is used, select SINICH. When RASP2 is used, select RASP. When radio RASP is used, select WS.
DEVICE: string type. Since input contents are different in each DEVICETYPE, see the following description.

Details of the node

HARK supports three audio devices as follows

JEOL System Technology Co., Ltd. The RASP series,
Tokyo Electron Device LTD. TD-BD-16ADUSB,
ALSA-based devices (e.g. RME Hammerfall DSP series Multiface)

The following are settings and instructions for each device.

RASP series Here, the parameter setting for the use of RASP-2 and Radio RASP are described.

RASP-2
CHANNEL_COUNT	8
DEVICETYPE	WS
DEVICE	IP address of RASP-2
Radio RASP
CHANNEL_COUNT	16
DEVICETYPE	WS
DEVICE	IP address of Radio RASP
Remarks	Some models of the RASP series have both microphone inputs and line inputs among the 16 channels. When such a model is used, ChannelSelector node needs to be connected to the AUDIO output of AudioStreamFromMic node and only the microphone input channel has to be selected.

TD-BD-16ADUSB

CHANNEL_COUNT	16
DEVICETYPE	SINICH
DEVICE	SINICH

Device corresponding to ALSA

CHANNEL_COUNT	8
DEVICETYPE	ALSA
DEVICE	plughw:0,1
Remarks	Designate plughw:a,b. Enter positive integers to a and b. Enter the card number indicated in arecord -l to a. When multiple audio input devices are connected, multiple card numbers are indicated. Enter card number to be used. Enter the subdevice number indicated in arecord -l to b. For a device that has multiple subdevices, enter the number of the subdevice to be used. Devices that have analog input and digital inputs are one of the examples of multiple subdevices.