This node takes in multichannel speech waveform data from a microphone array. The audio interface devices supported by this node are the RASP series manufactured by System In Frontier, Inc., TD-BD-16ADUSB manufactured by Tokyo Electron Device, and ALSA-based devices (e.g. The RME Hammerfall DSP Multiface series). Furthermore, this module can receive IEEE-float-formatted multi-channel raw audio stream through a TCP/IP socket connection. For an introduction to various devices, see Section 8.
No files are required.
When to use
This node is used when wishing to use speech waveform data from a microphone array as input to the HARK system.
Typical connection
Figure 6.1 shows an example usage of the AudioStreamFromMic node.
Overview of device
Among the devices that the AudioStreamFromMic node supports, the following are introduced with photos.
Wireless RASP
RME Hammerfall DSP series Multiface (Device corresponding to ALSA).
Figure 6.2 shows the appearance of the wireless RASP. Connection with the HARK system is established through Ethernet with a wireless LAN. The power is supplied to the wireless RASP with an attached AC adapter. Since the wireless RASP responds to plug in power, a microphone of the plug in power supply can be connected to the terminal without any change. Sound recording can easily performed without a microphone preamplifier as an advantage.
Figures 6.3 and 6.4 show the appearance of the RME Hammerfall DSP series Multiface. The device communicates with a host PC through a 32bit CardBus. Although a microphone can be connected to the device through a 6.3 mm TRS terminal, a microphone amplifier is used to ensure the input level (Figure 6.4).) For example, the user may connect a microphone to RME OctaMic II and connect OctaMic II and Multiface. OctaMic II supports a phantom power supply, and a condenser microphone that requires phantom power (e.g. DPA 4060-BM) can be connected directly. However, since it does not have a plug in power supplying function, a battery box for plug in power is required to connect plug in power supply type microphones. For example, such battery boxes are attached to Sony EMC-C115 and audio-technica AT9903.
Parameter name |
Type |
Default value |
Unit |
Description |
LENGTH |
512 |
[pt] |
Frame length as a fundamental unit for processing. |
|
ADVANCE |
160 |
[pt] |
Frame shift length. |
|
CHANNEL_COUNT |
8 |
[ch] |
Microphone input channel number of a device to use. |
|
SAMPLING_RATE |
16000 |
[Hz] |
Sampling frequency of audio waveform data loaded. |
|
DEVICETYPE |
WS |
Type of device to be used. |
||
GAIN |
0dB |
Gain value used with RASP device. |
||
DEVICE |
127.0.0.1 |
Character string necessary to access to device. Device name such as "plughw:0,1" or IP address when RASP is used. |
Input Not required.
Output
: Matrix<float> type. Indexed, multichannel audio waveform data with rows as channels and columns as samples. Size of the column is equal to the parameter LENGTH.
: bool type. This indicates whether there is still input from the waveform to be processed. Used as an ending flag when processing the waveforms in a loop. When it is true, waveforms are loaded, and when it is false, reading is complete. true is output continuously.
Parameter
: int type. The default value is 512. Designates the frame length, which is a base unit of processing, in terms of number of samples. The higher the value, the higher the frequency resolution, but the lower the temporal resolution. It is known that length corresponding to $20 \sim 40$ [ms] is appropriate for the analysis of audio waveforms. The default value of 32 [ms] corresponds to the sampling frequency 16,000 [Hz].
: int type. The default value is 160. Designates the frame shift length in samples. The default value of frame frequency of 10 [ms] corresponds to the sampling frequency 16,000 [Hz].
: int type. The number of channels of the device to be used.
: int type. The default value is 16000. Designates the sampling frequency – how often to sample per second – of the loaded waveforms. When frequencies up to $\omega $ [Hz] are needed for processing, set the sampling frequency to over $2\omega $ [Hz]. When the sampling frequency is high, data generally increases and it makes it difficult to perform real-time processing.
: string type. Select from ALSA, RASP, WS, TDBD16ADUSB, RASP24-16, RASP24-32, RASP-LC. When a device supporting ALSA-based drivers is used, select ALSA. When RASP-2 is used, select RASP. When wireless RASP is used, select WS. When TD-BD-16ADUSB is used, select TDBD16ADUSB. When RASP-24 is used with the 16bit quantization bit rate, select RASP24-16. When RASP-24 is used with the 24bit quantization bit rate, select RASP24-24. When RASP-LC is used with the wireless connection to your PC, select RASP-LC. (If RASP-LC is directly connected to your PC, select ALSA.) When you want to receive IEEE-float-formatted raw audio stream via a TCP/IP socket connection, select NETWORK.
: string type. The default value is 0dB. This sets the microphone gain for the recording. Select from 0dB, 12dB, 24dB, 36dB, 48dB. This parameter is activated when RASP-24 is used.
: string type. Since input contents are different in each DEVICETYPE, see the following description.
HARK supports three audio devices as follows:
The following RASP series manufactured by System In Frontier, Inc.
RASP-2
Wireless RASP
RASP-24
RASP-LC
TD-BD-16ADUSB manufactured by Tokyo Electron Device Co., Ltd.
ALSA-based devices. The following devices are the examples.
Kinect Xbox (manufactured by Microsoft)
PlayStation Eye (manufactured by Sony)
Microcone (manufactured by Dev-Audio)
RME Hammerfall DSP series Multiface
Raw audio stream via TCP/IP socket connection (IEEE float wav format)
The following are parameter settings for each device.
RASP series:
Parameter settings for using RASP-2
CHANNEL_COUNT |
8 |
DEVICETYPE |
WS |
DEVICE |
IP address of RASP-2 |
Parameter settings for using Wireless RASP
CHANNEL_COUNT |
16 |
DEVICETYPE |
WS |
DEVICE |
IP address of Wireless RASP |
Remarks |
Some models of the RASP series have both microphone inputs and line inputs among the 16 channels. When such a model is used, ChannelSelector node needs to be connected to the AUDIO output of AudioStreamFromMic node and only the microphone input channel has to be selected. |
Parameter settings for using RASP-24
CHANNEL_COUNT |
Miltiples of 9 |
DEVICETYPE |
RASP24-16 or RASP24-32 |
DEVICE |
IP address of RASP-24 |
Remarks |
Set DEVICETYPE=RASP24-16 for the recording with the 16bit quantization bit rate. Set DEVICETYPE=RASP24-32 for the recording with the 24bit quantization bit rate. CHANNEL_COUNT should be the multiples of 9. The channels from 0th channel to 7th channel are microphone channels. The 8th channel is a line input. For microphone array processing, ChannelSelector node needs to be connected to the AUDIO output of AudioStreamFromMic node and only the microphone input channel has to be selected. |
Parameter settings for using RASP-LC
CHANNEL_COUNT |
8 |
DEVICETYPE |
ALSA or RASP-LC |
DEVICE |
If DEVICETYPE=ALSA, DEVICE parameter should be plughw:a,b. Please refer “Device corresponding to ALSA” for the detail of the parameter setting. If DEVICETYPE=RASP-LC,DEVICE parameter should be the IP address of RASP-LC. |
Remarks |
If the RASP-LC is connected directly to the USB interface of the PC, set DEVICETYPE=ALSA. If the RASP-LC is connected to the PC through the wireless LAN, set DEVICETYPE=RASP-LC. All the channels are microphone channels. |
Devices manufactured by Tokyo Electron Device LTD.:
Parameter settings for using TD-BD-16ADUSB
CHANNEL_COUNT |
16 |
DEVICETYPE |
TDBD16ADUSB |
DEVICE |
TDBD16ADUSB |
Device corresponding to ALSA:
To use ALSA devices, designate plughw:a,b as the DEVICE parameter. Enter positive integers to a and b. Enter the card number indicated in arecord -l to a. When multiple audio input devices are connected, multiple card numbers are indicated. Enter card number to be used. Enter the subdevice number indicated in arecord -l to b. For a device that has multiple subdevices, enter the number of the subdevice to be used. Devices that have analog input and digital inputs are one of the examples of multiple subdevices.
Parameter settings for using Kinect Xbox
CHANNEL_COUNT |
4 |
DEVICETYPE |
ALSA |
DEVICE |
plughw:a,b |
Parameter settings for using PlayStation Eye
CHANNEL_COUNT |
4 |
DEVICETYPE |
ALSA |
DEVICE |
plughw:a,b |
Parameter settings for using Microcone
CHANNEL_COUNT |
7 |
DEVICETYPE |
ALSA |
DEVICE |
plughw:a,b |
Parameter settings for using RME Hammerfall DSP Multiface series
CHANNEL_COUNT |
8 |
DEVICETYPE |
ALSA |
DEVICE |
plughw:a,b |
Socket Connection ( if DEVICETYPE=NETWORK is selected ):
DEVICE should be the IP address of the machine that sends the audio stream. Othre parameters should be set depending on the setting of audio stream. If the audio has $M$ channels and can be obtained $T$ samples at once, you can send the audio strem by the program like the following pseudo code.
WHILE(1){ X = Get_Audio_Stream (Suppose X is a T-by-M matrix.) FOR t = 1 to T FOR m = 1 to M DATA[M * t + m] = X[t][m] ENDFOR ENDFOR send(soket_id, (char*)DATA, M * T * sizeof(float), 0) }
Here, $X$ is IEEE-float-formated audio stream. Therefore, $-1 \leq X \leq 1$.
Device corresponding to DirectSound on Windows OS:
On Windows, this node can accept DirectSound devices in addition to wireless RASP, RASP-24, and socket connection. You can designate these devices by entering device name with the DEVICE parameter. Note that you cannot use multi-byte characters for DEVICE parameter.
You have two ways to know the device name. One is to use Device Manager, and the other is to use “Sound Device List” which is provided by HARK. If you want to use Sound Device List, click [Start] $\rightarrow $ [Programs] $\rightarrow $ [HARK] $\rightarrow $ [Sound Device List]. Then, it lists up all the name of sound devices connected to your PC like Figure 6.5. You can also use a partial name. For instance, if “Hammerfall DSP” is listed, you can use “Hammerfall” for DEVICE parameter. AudioStreamFromMic uses the top one on the list when more than two candidates are matched.
For three devices, Kinect Xbox, PlayStation Eye, and Microcone, you can use the parameters shown in the next section for DEVICE parameter.
Device corresponding to ASIO on Windows OS: For ASIO devices, such as Microcone or RME Hammerfall DSP series Multiface, you need to download and install HARK ASIO Plugin from HARK web page. In this case, you need to use AudioStreamFromASIO instead of AudioStreamFromMic .
Device corresponding to DirectSound on Windows OS:
Parameter settings for using Kinect Xbox on Windows OS
CHANNEL_COUNT |
4 |
DEVICETYPE |
DS |
DEVICE |
kinect |
Parameter settings for using PlayStation Eye on Windows OS
CHANNEL_COUNT |
4 |
DEVICETYPE |
DS |
DEVICE |
pseye |
Parameter settings for using Microcone on Windows OS
CHANNEL_COUNT |
7 |
DEVICETYPE |
DS |
DEVICE |
microcone |
Parameter settings for using TAMAGO on Windows OS
CHANNEL_COUNT |
8 |
DEVICETYPE |
DS |
DEVICE |
TAMAGO or tamago |
Parameter settings for using RASP-ZX on Windows OS
CHANNEL_COUNT |
8 or 16 |
DEVICETYPE |
DS |
DEVICE |
rasp |
Parameter settings for using RME Hammerfall DSP Multiface series on Windows OS
CHANNEL_COUNT |
8 |
DEVICETYPE |
ASIO |
DEVICE |
ASIO Hammerfall DSP |