6.1.1 AudioStreamFromMic

6.1.1.1 Outline of the node

This node takes in multichannel speech waveform data from a microphone array. The audio interface devices supported by this node are the RASP series manufactured by System In Frontier, Inc., TD-BD-16ADUSB manufactured by Tokyo Electron Device, and ALSA-based devices (e.g. The RME Hammerfall DSP Multiface series). Furthermore, this module can receive IEEE-float-formatted multi-channel raw audio stream through a TCP/IP socket connection. For an introduction to various devices, see Section 8.

6.1.1.2 Necessary file

No files are required.

6.1.1.3 How to use

When to use

This node is used when wishing to use speech waveform data from a microphone array as input to the HARK system.

Typical connection

Figure 6.1 shows an example usage of the AudioStreamFromMic  node.

\includegraphics[width=.8\textwidth ]{fig/modules/AudioStreamFromMic}
Figure 6.1: Connection example of AudioStreamFromMic 

Overview of device

Among the devices that the AudioStreamFromMic  node supports, the following are introduced with photos.

  1. Wireless RASP

  2. RME Hammerfall DSP series Multiface (Device corresponding to ALSA).

6.1.1.3.1 1. Wireless RASP

Figure 6.2 shows the appearance of the wireless RASP. Connection with the HARK system is established through Ethernet with a wireless LAN. The power is supplied to the wireless RASP with an attached AC adapter. Since the wireless RASP responds to plug in power, a microphone of the plug in power supply can be connected to the terminal without any change. Sound recording can easily performed without a microphone preamplifier as an advantage.

\includegraphics[width=.5\textwidth ]{fig/modules/AD/AD-WL-RASP}
Figure 6.2: Wireless RASP

6.1.1.3.2 2. RME Hammerfall DSP Multiface series

Figures 6.3 and 6.4 show the appearance of the RME Hammerfall DSP series Multiface. The device communicates with a host PC through a 32bit CardBus. Although a microphone can be connected to the device through a 6.3 mm TRS terminal, a microphone amplifier is used to ensure the input level (Figure 6.4).) For example, the user may connect a microphone to RME OctaMic II and connect OctaMic II and Multiface. OctaMic II supports a phantom power supply, and a condenser microphone that requires phantom power (e.g. DPA 4060-BM) can be connected directly. However, since it does not have a plug in power supplying function, a battery box for plug in power is required to connect plug in power supply type microphones. For example, such battery boxes are attached to Sony EMC-C115 and audio-technica AT9903.

\includegraphics[width=.5\textwidth ]{fig/modules/AD/AD-RME}
Figure 6.3: Front view of RME Hammerfall DSP Multiface

\includegraphics[width=.5\textwidth ]{fig/modules/AD/AD-RME-back}
Figure 6.4: Back view of RME Hammerfall DSP Multiface

6.1.1.4 Input-output and property of node

Table 6.2: Parameter of AudioStreamFromMic 

Parameter name

Type

Default value

Unit

Description

LENGTH

int 

512

[pt]

Frame length as a fundamental unit for processing.

ADVANCE

int 

160

[pt]

Frame shift length.

CHANNEL_COUNT

int 

8

[ch]

Microphone input channel number of a device to use.

SAMPLING_RATE

int 

16000

[Hz]

Sampling frequency of audio waveform data loaded.

DEVICETYPE

string 

WS

 

Type of device to be used.

GAIN

string 

0dB

 

Gain value used with RASP device.

DEVICE

string 

127.0.0.1

 

Character string necessary to access to device. Device name such as "plughw:0,1" or IP address when RASP is used.

Input Not required.

Output

AUDIO

: Matrix<float>  type. Indexed, multichannel audio waveform data with rows as channels and columns as samples. Size of the column is equal to the parameter LENGTH.

NOT_EOF

: bool  type. This indicates whether there is still input from the waveform to be processed. Used as an ending flag when processing the waveforms in a loop. When it is true, waveforms are loaded, and when it is false, reading is complete. true is output continuously.

Parameter

LENGTH

: int  type. The default value is 512. Designates the frame length, which is a base unit of processing, in terms of number of samples. The higher the value, the higher the frequency resolution, but the lower the temporal resolution. It is known that length corresponding to $20 \sim 40$ [ms] is appropriate for the analysis of audio waveforms. The default value of 32 [ms] corresponds to the sampling frequency 16,000 [Hz].

ADVANCE

: int  type. The default value is 160. Designates the frame shift length in samples. The default value of frame frequency of 10 [ms] corresponds to the sampling frequency 16,000 [Hz].

CHANNEL_COUNT

: int  type. The number of channels of the device to be used.

SAMPLING_RATE

: int  type. The default value is 16000. Designates the sampling frequency – how often to sample per second – of the loaded waveforms. When frequencies up to $\omega $ [Hz] are needed for processing, set the sampling frequency to over $2\omega $ [Hz]. When the sampling frequency is high, data generally increases and it makes it difficult to perform real-time processing.

DEVICETYPE

: string  type. Select from ALSA, RASP, WS, TDBD16ADUSB, RASP24-16, RASP24-32, RASP-LC. When a device supporting ALSA-based drivers is used, select ALSA. When RASP-2 is used, select RASP. When wireless RASP is used, select WS. When TD-BD-16ADUSB is used, select TDBD16ADUSB. When RASP-24 is used with the 16bit quantization bit rate, select RASP24-16. When RASP-24 is used with the 24bit quantization bit rate, select RASP24-24. When RASP-LC is used with the wireless connection to your PC, select RASP-LC. (If RASP-LC is directly connected to your PC, select ALSA.) When you want to receive IEEE-float-formatted raw audio stream via a TCP/IP socket connection, select NETWORK.

GAIN

: string  type. The default value is 0dB. This sets the microphone gain for the recording. Select from 0dB, 12dB, 24dB, 36dB, 48dB. This parameter is activated when RASP-24 is used.

DEVICE

: string  type. Since input contents are different in each DEVICETYPE, see the following description.

6.1.1.5 Details of the node

HARK supports three audio devices as follows:

  1. The following RASP series manufactured by System In Frontier, Inc.

    • RASP-2

    • Wireless RASP

    • RASP-24

    • RASP-LC

  2. TD-BD-16ADUSB manufactured by Tokyo Electron Device Co., Ltd.

  3. ALSA-based devices. The following devices are the examples.

    • Kinect Xbox (manufactured by Microsoft)

    • PlayStation Eye (manufactured by Sony)

    • Microcone (manufactured by Dev-Audio)

    • RME Hammerfall DSP series Multiface

  4. Raw audio stream via TCP/IP socket connection (IEEE float wav format)

The following are parameter settings for each device.

RASP series:

CHANNEL_COUNT

8

DEVICETYPE

WS

DEVICE

IP address of RASP-2

CHANNEL_COUNT

16

DEVICETYPE

WS

DEVICE

IP address of Wireless RASP

Remarks

Some models of the RASP series have both microphone inputs and line inputs among the 16 channels. When such a model is used, ChannelSelector  node needs to be connected to the AUDIO output of AudioStreamFromMic  node and only the microphone input channel has to be selected.

CHANNEL_COUNT

Miltiples of 9

DEVICETYPE

RASP24-16 or RASP24-32

DEVICE

IP address of RASP-24

Remarks

Set DEVICETYPE=RASP24-16 for the recording with the 16bit quantization bit rate. Set DEVICETYPE=RASP24-32 for the recording with the 24bit quantization bit rate. CHANNEL_COUNT should be the multiples of 9. The channels from 0th channel to 7th channel are microphone channels. The 8th channel is a line input. For microphone array processing, ChannelSelector  node needs to be connected to the AUDIO output of AudioStreamFromMic  node and only the microphone input channel has to be selected.

CHANNEL_COUNT

8

DEVICETYPE

ALSA or RASP-LC

DEVICE

If DEVICETYPE=ALSA, DEVICE parameter should be plughw:a,b. Please refer “Device corresponding to ALSA” for the detail of the parameter setting. If DEVICETYPE=RASP-LC,DEVICE parameter should be the IP address of RASP-LC.

Remarks

If the RASP-LC is connected directly to the USB interface of the PC, set DEVICETYPE=ALSA. If the RASP-LC is connected to the PC through the wireless LAN, set DEVICETYPE=RASP-LC. All the channels are microphone channels.

Devices manufactured by Tokyo Electron Device LTD.:

CHANNEL_COUNT

16

DEVICETYPE

TDBD16ADUSB

DEVICE

TDBD16ADUSB

Device corresponding to ALSA:

To use ALSA devices, designate plughw:a,b as the DEVICE parameter. Enter positive integers to a and b. Enter the card number indicated in arecord -l to a. When multiple audio input devices are connected, multiple card numbers are indicated. Enter card number to be used. Enter the subdevice number indicated in arecord -l to b. For a device that has multiple subdevices, enter the number of the subdevice to be used. Devices that have analog input and digital inputs are one of the examples of multiple subdevices.

CHANNEL_COUNT

4

DEVICETYPE

ALSA

DEVICE

plughw:a,b

CHANNEL_COUNT

4

DEVICETYPE

ALSA

DEVICE

plughw:a,b

CHANNEL_COUNT

7

DEVICETYPE

ALSA

DEVICE

plughw:a,b

CHANNEL_COUNT

8

DEVICETYPE

ALSA

DEVICE

plughw:a,b

Socket Connection ( if DEVICETYPE=NETWORK is selected ):

DEVICE should be the IP address of the machine that sends the audio stream. Othre parameters should be set depending on the setting of audio stream. If the audio has $M$ channels and can be obtained $T$ samples at once, you can send the audio strem by the program like the following pseudo code.

WHILE(1){
    
    X = Get_Audio_Stream (Suppose X is a T-by-M matrix.)

    FOR t = 1 to T

        FOR m = 1 to M

            DATA[M * t + m] = X[t][m]

        ENDFOR

    ENDFOR

    send(soket_id, (char*)DATA, M * T * sizeof(float), 0)
  
}

Here, $X$ is IEEE-float-formated audio stream. Therefore, $-1 \leq X \leq 1$.

Device corresponding to DirectSound on Windows OS:

On Windows, this node can accept DirectSound devices in addition to wireless RASP, RASP-24, and socket connection. You can designate these devices by entering device name with the DEVICE parameter. Note that you cannot use multi-byte characters for DEVICE parameter.

You have two ways to know the device name. One is to use Device Manager, and the other is to use “Sound Device List” which is provided by HARK. If you want to use Sound Device List, click [Start] $\rightarrow $ [Programs] $\rightarrow $ [HARK] $\rightarrow $ [Sound Device List]. Then, it lists up all the name of sound devices connected to your PC like Figure 6.5. You can also use a partial name. For instance, if “Hammerfall DSP” is listed, you can use “Hammerfall” for DEVICE parameter. AudioStreamFromMic uses the top one on the list when more than two candidates are matched.

For three devices, Kinect Xbox, PlayStation Eye, and Microcone, you can use the parameters shown in the next section for DEVICE parameter.

Device corresponding to ASIO on Windows OS: For ASIO devices, such as Microcone or RME Hammerfall DSP series Multiface, you need to download and install HARK ASIO Plugin from HARK web page. In this case, you need to use AudioStreamFromASIO instead of AudioStreamFromMic .

Device corresponding to DirectSound on Windows OS:

CHANNEL_COUNT

4

DEVICETYPE

DS

DEVICE

kinect

CHANNEL_COUNT

4

DEVICETYPE

DS

DEVICE

pseye

CHANNEL_COUNT

7

DEVICETYPE

DS

DEVICE

microcone

CHANNEL_COUNT

8

DEVICETYPE

DS

DEVICE

TAMAGO or tamago

CHANNEL_COUNT

8 or 16

DEVICETYPE

DS

DEVICE

rasp

CHANNEL_COUNT

8

DEVICETYPE

ASIO

DEVICE

ASIO Hammerfall DSP

\includegraphics[width=.5\textwidth ]{fig/modules/DeviceList}
Figure 6.5: Confirmation of the device name