This node takes in multichannel speech waveform data from a microphone array. The audio interface devices supported by this node are System In Frontier, Inc. (previously JEOL SYSTEM TECHNOLOGY CO., LTD.), the RASP series, Tokyo Electron Device TD-BD-16ADUSB and ALSA-based devices (e.g. The RME Hammerfall DSP Multiface series). For an introduction to various devices, see Section 8.
No files are required.
When to use
This node is used when wishing to use speech waveform data from a microphone array as input to the HARK system.
Typical connection
Figure 6.1 shows an example usage of the AudioStreamFromMic node.
Overview of device
Among the devices that the AudioStreamFromMic node supports, the following are introduced with photos.
Radio RASP
RME Hammerfall DSP series Multiface (Device corresponding to ALSA).
Figure 6.2 shows the appearance of the radio RASP. Connection with the HARK system is established through Ethernet with a wireless LAN. The power is supplied to the radio RASP with an attached AC adapter. Since the radio RASP responds to plug in power, a microphone of the plug in power supply can be connected to the terminal without any change. Sound recording can easily performed without a microphone preamplifier as an advantage.
Figures 6.3 and 6.4 show the appearance of the RME Hammerfall DSP series Multiface. The device communicates with a host PC through a 32bit CardBus. Although a microphone can be connected to the device through a 6.3 mm TRS terminal, a microphone amplifier is used to ensure the input level (Figure 6.4).) For example, the user may connect a microphone to RME OctaMic II and connect OctaMic II and Multiface. OctaMic II supports a phantom power supply, and a condenser microphone that requires phantom power (e.g. DPA 4060-BM) can be connected directly. However, since it does not have a plug in power supplying function, a battery box for plug in power is required to connect plug in power supply type microphones. For example, such battery boxes are attached to Sony EMC-C115 and audio-technica AT9903.
Parameter name |
Type |
Default value |
Unit |
Description |
LENGTH |
512 |
[pt] |
Frame length as a fundamental unit for processing. |
|
ADVANCE |
160 |
[pt] |
Frame shift length. |
|
CHANNEL_COUNT |
8 |
[ch] |
Microphone input channel number of a device to use. |
|
SAMPLING_RATE |
16000 |
[Hz] |
Sampling frequency of audio waveform data loaded. |
|
DEVICETYPE |
WS |
Type of device to be used. |
||
GAIN |
0dB |
Gain value used with RASP device. |
||
DEVICE |
127.0.0.1 |
Character string necessary to access to device. Device name such as "plughw:0,1" or IP address when RASP is used. |
Input Not required.
Output
Matrix<float> type. Indexed, multichannel audio waveform data with rows as channels and columns as samples. Size of the column is equal to the parameter LENGTH.
bool type. This indicates whether there is still input from the waveform to be processed. Used as an ending flag when processing the waveforms in a loop. When it is true, waveforms are loaded, and when it is false, reading is complete. true is output continuously.
Parameter
int type. The default value is 512. Designates the frame length, which is a base unit of processing, in terms of number of samples. The higher the value, the higher the frequency resolution, but the lower the temporal resolution. It is known that length corresponding to [ms] is appropriate for the analysis of audio waveforms. The default value of 32 [ms] corresponds to the sampling frequency 16,000 [Hz].
int type. The default value is 160. Designates the frame shift length in samples. The default value of 10 [ms] corresponds to the sampling frequency 16,000 [Hz].
int type. The number of channels of the device to be used.
int type. The default value is 16000. Designates the sampling frequency – how often to sample per second – of the loaded waveforms. When frequencies up to [Hz] are needed for processing, set the sampling frequency to over [Hz]. When the sampling frequency is high, data generally increases and it makes it difficult to perform real-time processing.
string type. Select from ALSA, SINICH, RASP and WS. When a device supporting ALSA-based drivers is used, select ALSA. When TD-BD-16ADUSB is used, select SINICH. When RASP2 is used, select RASP. When radio RASP is used, select WS.
string type. Since input contents are different in each DEVICETYPE, see the following description.
HARK supports three audio devices as follows
JEOL System Technology Co., Ltd. The RASP series,
Tokyo Electron Device LTD. TD-BD-16ADUSB,
ALSA-based devices (e.g. RME Hammerfall DSP series Multiface)
The following are settings and instructions for each device.
RASP series Here, the parameter setting for the use of RASP-2 and Radio RASP are described.
RASP-2 |
|
CHANNEL_COUNT |
8 |
DEVICETYPE |
WS |
DEVICE |
IP address of RASP-2 |
Radio RASP |
|
CHANNEL_COUNT |
16 |
DEVICETYPE |
WS |
DEVICE |
IP address of Radio RASP |
Remarks |
Some models of the RASP series have both microphone inputs and line inputs among the 16 channels. When such a model is used, ChannelSelector node needs to be connected to the AUDIO output of AudioStreamFromMic node and only the microphone input channel has to be selected. |
TD-BD-16ADUSB
CHANNEL_COUNT |
16 |
DEVICETYPE |
SINICH |
DEVICE |
SINICH |
Device corresponding to ALSA
CHANNEL_COUNT |
8 |
DEVICETYPE |
ALSA |
DEVICE |
plughw:0,1 |
Remarks |
Designate plughw:a,b. Enter positive integers to a and b. Enter the card number indicated in arecord -l to a. When multiple audio input devices are connected, multiple card numbers are indicated. Enter card number to be used. Enter the subdevice number indicated in arecord -l to b. For a device that has multiple subdevices, enter the number of the subdevice to be used. Devices that have analog input and digital inputs are one of the examples of multiple subdevices. |