Practice 2: 音源分離・認識

Practice 2 on this page are for HARK Ver.3.6.0 or later.

Goal and Agenda of Practice2

  • 目的
    • 定位・分離・認識機能の実現方法を学ぶ
  • 内容
    • 2話者同時発話を認識するシステムを構築
      • 2-1. HARKの音源分離と音声認識の概要を学ぶ
      • 2-2. サンプルネットワークファイルの構成を学ぶ
      • 2-3. 事前収録音で処理を行い,結果を確認する
      • 2-4. 認識率を算出する
      • 2-5. オンライン音源分離・音声認識を体験する

 

Practice2-1 HARKによる同時発話認識の構築

同時発話認識の処理(聖徳太子ロボット)の概要

複数の話者が同時に発話するのを、マクロアレイで取得し、音源方向を同定し(「音源同定」)、音源を追跡し(「音源追跡」)、音源を分離し(「音源分離」)、ノイズを抑圧し(「ノイズ抑圧」)、特徴量を抽出し、音声認識を行います。

Practice2-1: システムの概要ー処理の流れ

Practice2-1 の処理の流れ
  1. 音源定位では、MUSICを使用し、その後処理として音源追跡を行う
  2. 音源分離では、方向情報を手掛かりにGHDSSにより分離し、その後処理としてポストフィルタリングによるノイズ抑制を行う
  3. 音声認識では、方向情報が一貫した分離音から特徴量を抽出し、音源ごとの特徴量を音声認識エンジンに渡し、音声認識をリクエスト
  4. 音声認識結果を音源ごとに提示
具体的な処理:
  • 同時発話認識のネットワークファイル practice2_sep_rec.nHARK_tutorial_2024/practice2/data/ フォルダにある)
  • HARK Designer を起動し,practice2_sep_rec.n をロード
  • practice2_sep_rec.n のロード

Practice2-2: practice2_sep_rec.n の構成(5つのサブネットワーク)

Practice2-1 のサブネットワーク
  • MAIN: main関数
  • Practice2-1 のMAIN シート
  • MAIN_LOOP: 繰り返し実行される関数
  • Practice2-1 のMAIN_LOOP シート
  • sub_localization: 音源定位の関数
  • Practice2-1 のsub_localization シート
      Practice1-1からの変更点は4か所:
    1. AudioStreamFromWaveMultiFFT は上段のMAIN_LOOP サブネットに統合
    2. SourceIntervalExtender が追加(オフセットを追加し,分離音の先頭が切れるのを防ぐ)
    3. 音源定位結果の表示に Kivyベースの plotQuickSourceKivy に変更
    4. LocalizeMusicノードのPropertyで,A_MATRIX (伝達関数ファイル)hark_conf/tamago_rectf.zipを指定
  • sub_separation: 音源分離の関数
  • Practice2-1 のsub_separation シート
    • 音源分離は3つのモジュールから構成:
      • ① GHDSSノードによる音源分離
      • ② ポストフィルタリングによる消し残り抑圧
      • ③ 分離音のファイル出力
    • GHDSSノードのPropertyで,TF_CONF_FILENAME(伝達関数ファイル)hark_conf/tamago_rectf.zipを指定
    • 伝達関数は、いくつかのマイクアレイ用事前測定したものがあり、また、自前で用意することもできる。
      • TAMAGO, PSEye, くらげクン, Kinect, Microcone
      • 事前に測定した伝達関数: https://hark.jp/document/supported/
      • TSP信号測定し,harktool5 の利用で,自作も可能
  • sub_recognition: 音声認識用の特徴抽出の関数
  • Practice2-1 のsub_recognition シート
    • ① 分離音声の認識精度向上のために白色雑音加算
    • ② PreEmphasis: 高周波領域強調
    • ③ MelFilterBank: メルフィルタバンク解析
    • ④ MSLSExtraction: MSLS (Mel-scale log spectrum)特徴量抽出
    • ⑤ 音声認識器に音声特徴をソケット通信で送信

Practice2-3: 同時発話認識の実行

2話者同時発話認識
  • TAMAGOで録音した2話者同時発話を定位・分離・認識する
  • 使用するファイル
    • 2SPK-ja.wav:混合音声の8ch録音
    • transcriptin_{A,B}.txt: 正解データ
      • 2SPK-ja.wavの書き起こし
      • 2つのファイルの同じ行の語が同時に発話されている

音声認識エンジン KaldiDecoder の起動

  1. 端末を開き practice2/data へ移動
  2. $ cd ~/HARK_tutorial_2024/practice2/data
  3. 1_run_ASR.sh を実行
  4. $ sh 1_run_ASR.sh
  5. 新しい端末がポップアップし,下のように表示されるまで待つ
  6. KaldiDecoderの起動

HARKネットワークの実行

  1. 2_run_HARK.sh を実行 (新たに端末を開く場合,指定ディレクトリへの移動を忘れずに)
  2. $ sh 2_run_HARK.sh
  3. 定位結果 (グラフ)および認識結果が表示されれば成功
  4. 2話者同時発話認識の結果の表示
  5. sep_files/ に分離音が保存される

分離音の確認

  • practice2_playback.n を使って再生
  • 分離前の音を再生
  • 分離前の混合音の再生ネットワーク
    • 5_run_PlayAudio.sh を実行
    • $ sh 5_run_PlayAudio.sh
  • 離後の音を聞
  • 分離音の再生ネットワーク
    • 6-A_run_PlayAudio.sh を実行 (45゜方向)
    • $ sh 6_A_run_PlayAudio.sh
    • 6-B_run_PlayAudio.sh を実行 (-35゜方向)
    • $ sh 6_B_run_PlayAudio.sh

Practice2-4: 音声認識率の算出と性能の確認

  • 単語音声認識率(Word Error Rate)
  • 単語認識率
    • C: Correct (正解単語数)
    • S: Substitution (置換誤り単語数)
    • D: Deletion (脱落誤り単語数)
    • I: Insertion (挿入誤り単語数)
  • 今回は音源方向±10度の範囲の発話を評価
  • 音源方向に対する許容誤差 practice2-1での音声認識の評価
  • 2話者音声認識結果
    • kaldi_out_progress.txt:逐次認識結果を表示
    • kaldi_out.txt:最終認識結果を表示
    • practice2-1での音声認識結果

音声認識率の自動算出

  1. 3-{A,B}_recognition_rate.sh を実行
  2. $ sh 3-A_recognition_rate.sh
    $ sh 3-B_recognition_rate.sh
    実行ファイル 正解書き起こしファイル 音源方向 許容誤差
    3-A_recognition_rate.sh transcription_A.txt 45° 10°
    3-B_recognition_rate.sh transcription_B.txt -35° 10°
  3. 以下のように表示されれば成功
  4. 音声認識率の算出結果

ネットワークファイル practice2_sep_rec.n の中身

#!/usr/bin/env batchflow
<?xml version="1.0"?>
<Document>
  <Network type="subnet" name="MAIN">
    <Node name="node_Constant_1" type="Constant" x="100" y="100">
      <Parameter name="VALUE" type="subnet_param" value="ARG1" description="The value"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_InputStream_1" type="InputStream" x="280" y="100">
      <Parameter name="TYPE" type="String" value="" description="Type of stream: stream, fd, or FILE (default stream)"/>
      <Parameter name="RETRY" type="int" value="" description="If set to N, InputStream will retry N times on open fail"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_MAIN_LOOP_1" type="MAIN_LOOP" x="490" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The frame length of each channel (in samples) [default: 512]."/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
    </Node>
    <Link from="node_Constant_1" output="VALUE" to="node_InputStream_1" input="INPUT"/>
    <Link from="node_InputStream_1" output="OUTPUT" to="node_MAIN_LOOP_1" input="INPUT"/>
    <NetOutput name="ASR-A" node="node_MAIN_LOOP_1" terminal="ASR-A" object_type="any" description="Dynamic"/>
    <NetOutput name="OUTPUT" node="node_MAIN_LOOP_1" terminal="OUTPUT" object_type="any" description="Dynamic"/>
  </Network>
  <Network type="iterator" name="MAIN_LOOP">
    <Node name="node_MultiFFT_1" type="MultiFFT" x="610" y="100">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="FFT length in sample. [default: 512]"/>
      <Parameter name="WINDOW" type="string" value="CONJ" description="A window function for FFT. WINDOW should be CONJ, HAMMING, RECTANGLE, or HANNING. [default: CONJ]"/>
      <Parameter name="WINDOW_LENGTH" type="subnet_param" value="LENGTH" description="Window length of the window function. [default: 512]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_AudioStreamFromWave_1" type="AudioStreamFromWave" x="100" y="100">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="The frame length of each channel (in samples) [default: 512]."/>
      <Parameter name="ADVANCE" type="subnet_param" value="ADVANCE" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="USE_WAIT" type="bool" value="true" description="If true, real recording is simulated [default: false]."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_sub_separation_1" type="sub_separation" x="920" y="100">
    </Node>
    <Node name="node_sub_recognition_1" type="sub_recognition" x="1070" y="300">
      <Parameter name="FBANK_COUNT" type="int" value="40" description="The size of the input feature vector."/>
      <Parameter name="LENGTH" type="int" value="512" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate in Hz.  [default: 16000]"/>
    </Node>
    <Node name="node_sub_localization_1" type="sub_localization" x="740" y="320">
      <Parameter name="LENGTH" type="int" value="512" description="The length of a frame (per channel)."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling Rate (Hz)."/>
    </Node>
    <Node name="node_MultiGain_1" type="MultiGain" x="410" y="100">
      <Parameter name="GAIN" type="float" value="0.000244140625" description="Gain factor."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_MultiFFT_1" output="OUTPUT" to="node_sub_localization_1" input="WAV"/>
    <Link from="node_MultiFFT_1" output="OUTPUT" to="node_sub_separation_1" input="SPEC"/>
    <Link from="node_sub_separation_1" output="POSTFLT_SPEC" to="node_sub_recognition_1" input="SPEC"/>
    <Link from="node_MultiGain_1" output="OUTPUT" to="node_MultiFFT_1" input="INPUT"/>
    <Link from="node_AudioStreamFromWave_1" output="AUDIO" to="node_MultiGain_1" input="INPUT"/>
    <Link from="node_sub_localization_1" output="OUTPUT" to="node_sub_separation_1" input="SRC_INFO"/>
    <Link from="node_sub_localization_1" output="OUTPUT" to="node_sub_recognition_1" input="SOURCES"/>
    <NetInput name="INPUT" node="node_AudioStreamFromWave_1" terminal="INPUT" object_type="Stream" description="An audio input stream (IStream)."/>
    <NetOutput name="ASR-A" node="node_sub_recognition_1" terminal="ASR-A" object_type="any" description="Dynamic"/>
    <NetOutput name="OUTPUT" node="node_sub_separation_1" terminal="OUTPUT" object_type="any" description="Dynamic"/>
    <NetCondition name="CONDITION" node="node_AudioStreamFromWave_1" terminal="NOT_EOF"/>
  </Network>
  <Network type="subnet" name="sub_separation">
    <Node name="node_PowerCalcForMap_1" type="PowerCalcForMap" x="100" y="330">
      <Parameter name="POWER_TYPE" type="string" value="POW" description="Measure for computing the POW or MAG (i.e. power or magnitude) of the complex spectrum [default: POW]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_CalcSpecAddPower_1" type="CalcSpecAddPower" x="760" y="530">
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_CalcSpecSubGain_1" type="CalcSpecSubGain" x="510" y="250">
      <Parameter name="ALPHA" type="float" value="1.0" description="Overestimation factor."/>
      <Parameter name="BETA" type="float" value="0.0" description="Spectral floor."/>
      <Parameter name="SS_METHOD" type="int" value="2" description="1: Magnitude Spectral Subtraction, 2: Power SS"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_EstimateLeak_1" type="EstimateLeak" x="360" y="600">
      <Parameter name="LEAK_FACTOR" type="float" value="0.25" description="Leak factor [default:0.25]"/>
      <Parameter name="OVER_CANCEL_FACTOR" type="float" value="1" description="Over cancel value [default:1]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_HRLE_1" type="HRLE" x="440" y="470">
      <Parameter name="LX" type="float" value="0.10" description="Lx value of estimation, e.g. Lx=0 -&gt; Minimum (MCRA), Lx=0.5 -&gt; Median , Lx=1.0 -&gt; Maximum [default:0.85]"/>
      <Parameter name="TIME_CONST_METHOD" type="string" value="LEGACY" description="Time constant value definition, "LEGACY" uses time constant value for HARK 2.0.0,"MILLISECOND" uses time constant value in frames. [default: LEGACY]"/>
      <Parameter name="TIME_CONSTANT" type="float" value="16000" description="Time constant for exponential decay window in samples [default:]"/>
      <Parameter name="DECAY_FACTOR" type="int" value="1000" description="Time constant for exponential decay window in millisecond [default:]"/>
      <Parameter name="ADVANCE" type="int" value="160" description="The length in sample between a frame and a previous frame. [default: 160]"/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (Hz) [default: 16000]."/>
      <Parameter name="NUM_BIN" type="float" value="2000" description="Number of histogram bins [default:1000]"/>
      <Parameter name="MIN_LEVEL" type="float" value="-200" description="Minimum level of histogram bin in dB [default:-100]"/>
      <Parameter name="STEP_LEVEL" type="float" value="0.2" description="Step level of histogram bin (Width of each histogram bin) in dB [default:0.2]"/>
      <Parameter name="DEBUG" type="bool" value="false" description="Prints the histogram for each 100 iterations."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SpectralGainFilter_1" type="SpectralGainFilter" x="1000" y="100">
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_GHDSS_1" type="GHDSS" x="160" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The frame length of each channel (in samples) [default: 512]."/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (Hz) [default: 16000]."/>
      <Parameter name="LOWER_BOUND_FREQUENCY" type="int" value="0" description="Lower bound of frequency (Hz). [default: 0]"/>
      <Parameter name="UPPER_BOUND_FREQUENCY" type="int" value="8000" description="Upper bound of frequency (Hz). [default: 8000]"/>
      <Parameter name="TF_INPUT_TYPE" type="string" value="FILE" description="Load form TF file or Input terminal."/>
      <Parameter name="TF_CONJ_FILENAME" type="string" value="hark_conf/tamago_rectf.zip" description="Filename of a pre-measured transfer function for separation."/>
      <Parameter name="INITW_FILENAME" type="string" value="" description="Filename of an initial separation matrix. If specified, a matrix in INITW_FILENAME is used as an initial separation matrix. Otherwise, initial separation matrix is estimated from the geometrical relationship or pre-measured TF according to TF_CONJ."/>
      <Parameter name="SS_METHOD" type="string" value="ADAPTIVE" description="The calculation method for SS step size parameter corresponding to the blind separation part. "FIX" uses a fixed step size,"LC_MYU" uses the same value as LC_MYU, and "ADAPTIVE" adaptively estimates an optimal step size. [default: ADAPTIVE]"/>
      <Parameter name="SS_SCAL" type="float" value="1.0" description="Scaling factor for SS step size. [default: 1.0]"/>
      <Parameter name="SS_MYU" type="float" value="0.001" description="SS step size value. [default 0.001]"/>
      <Parameter name="NOISE_FLOOR" type="float" value="0.0" description="Noise floor value. [default 0.0]"/>
      <Parameter name="LC_CONST" type="string" value="DIAG" description="The calculation method for geometric constraints. "FULL" uses all elements of a matrix, and "DIAG" only uses diagonal parts. [default: FULL]"/>
      <Parameter name="LC_METHOD" type="string" value="ADAPTIVE" description="The calculation method for LC step size corresponding to geometric constraints. "FIX" uses a fixed value, and "Adaptive" adaptively estimates an optimal step size. [default: ADAPTIVE]"/>
      <Parameter name="LC_MYU" type="float" value="0.001" description="LC step size value. [default 0.001]"/>
      <Parameter name="UPDATE_METHOD_TF_CONJ" type="string" value="POS" description="Switching method of TF_CONJ data. [default: POS]"/>
      <Parameter name="UPDATE_METHOD_W" type="string" value="ID" description="Switching method of separation matrix, W. [default: ID]"/>
      <Parameter name="UPDATE_ACCEPT_DISTANCE" type="float" value="300" description="Distance allowance to switch separation matrix in [mm]. available when when UPDATE_METHOD_W is POS or ID_POS. [default: 300.0]"/>
      <Parameter name="EXPORT_W" type="bool" value="false" description="Separation matrix W is exported if true. [default: false]"/>
      <Parameter name="EXPORT_W_FILENAME" type="string" value="" description="The filename to export W."/>
      <Parameter name="UPDATE" type="string" value="STEP" description="The update method of separation matrix. "STEP" updates W sequentially, i.e., based on SS and then on LC cost. "TOTAL" updates W based on an integrated value of SS and LC cost [default: STEP]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SaveWavePCM_1" type="SaveWavePCM" x="1210" y="430">
      <Parameter name="BASENAME" type="string" value="sep_files/sep_" description="Basename of files. [default: sep_]"/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (in samples)[default: 16000]."/>
      <Parameter name="BITS" type="string" value="int24" description="Bit format of samples. int16 and int24  bits are supported."/>
      <Parameter name="INPUT_BITS" type="string" value="auto" description="Bit format of input wav file."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_Synthesize_1" type="Synthesize" x="1220" y="280">
      <Parameter name="LENGTH" type="int" value="512" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="ADVANCE" type="int" value="160" description="The length in sample between a frame and a previous frame. [default: 160]"/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (Hz) [default: 16000]."/>
      <Parameter name="MIN_FREQUENCY" type="int" value="125" description="Minimum frequency (Hz) [default: 125]"/>
      <Parameter name="MAX_FREQUENCY" type="int" value="7900" description="Maximum frequency (Hz) [default: 7900]"/>
      <Parameter name="WINDOW" type="string" value="HAMMING" description="A window function for overlap-add. WINDOW should be CONJ, HAMMING, RECTANGLE, or HANNING. [default: HAMMING]"/>
      <Parameter name="OUTPUT_GAIN" type="float" value="1.0" description="Output gain factor. [default: 1.0]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_HRLE_1" output="NOISE_SPEC" to="node_CalcSpecAddPower_1" input="INPUT_POWER_SPEC1"/>
    <Link from="node_EstimateLeak_1" output="LEAK_POWER_SPEC" to="node_CalcSpecAddPower_1" input="INPUT_POWER_SPEC2"/>
    <Link from="node_PowerCalcForMap_1" output="OUTPUT" to="node_CalcSpecSubGain_1" input="INPUT_POWER_SPEC"/>
    <Link from="node_CalcSpecAddPower_1" output="OUTPUT_POWER_SPEC" to="node_CalcSpecSubGain_1" input="NOISE_SPEC"/>
    <Link from="node_PowerCalcForMap_1" output="OUTPUT" to="node_EstimateLeak_1" input="INPUT_POWER_SPEC"/>
    <Link from="node_PowerCalcForMap_1" output="OUTPUT" to="node_HRLE_1" input="INPUT_SPEC"/>
    <Link from="node_CalcSpecSubGain_1" output="VOICE_PROB" to="node_SpectralGainFilter_1" input="VOICE_PROB"/>
    <Link from="node_CalcSpecSubGain_1" output="GAIN" to="node_SpectralGainFilter_1" input="GAIN"/>
    <Link from="node_GHDSS_1" output="OUTPUT" to="node_PowerCalcForMap_1" input="INPUT"/>
    <Link from="node_GHDSS_1" output="OUTPUT" to="node_SpectralGainFilter_1" input="INPUT_SPEC"/>
    <Link from="node_Synthesize_1" output="OUTPUT" to="node_SaveWavePCM_1" input="INPUT"/>
    <Link from="node_SpectralGainFilter_1" output="OUTPUT_SPEC" to="node_Synthesize_1" input="INPUT"/>
    <NetInput name="SPEC" node="node_GHDSS_1" terminal="INPUT_FRAMES" object_type="Matrix<complex<float> >" description="Input multi-channel spectrum. A row is a channel, and a column is a spectrum for the corresponding channel."/>
    <NetInput name="SRC_INFO" node="node_GHDSS_1" terminal="INPUT_SOURCES" object_type="Vector<ObjectRef>" description="Source locations with ID. Each element of the vector is a source location with ID specified by "Source"."/>
    <NetOutput name="POSTFLT_SPEC" node="node_SpectralGainFilter_1" terminal="OUTPUT_SPEC" object_type="Map<int,ObjectRef>" description="Estimated voice spectrum(Vector<complex<float> >) with a key(source ID)."/>
    <NetOutput name="OUTPUT" node="node_SaveWavePCM_1" terminal="OUTPUT" object_type="Map<int,ObjectRef>" description="The same as input."/>
  </Network>
  <Network type="subnet" name="sub_recognition">
    <Node name="node_Delta_1" type="Delta" x="100" y="310">
      <Parameter name="FBANK_COUNT" type="int" value="41" description="The size of the input feature vector."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_FeatureRemover_1" type="FeatureRemover" x="370" y="310">
      <Parameter name="SELECTOR" type="object" value="<Vector<int> 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81>" description="Component indices in a feature vector to remove. E.g. &lt;Vector&lt;int&gt; 13&gt; to remove 14th comopnent (The index start with 0)."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_MSLSExtraction_1" type="MSLSExtraction" x="800" y="130">
      <Parameter name="FBANK_COUNT" type="subnet_param" value="FBANK_COUNT" description="Size of the static part of MSLS feature vector. [default: 13]"/>
      <Parameter name="NORMALIZATION_MODE" type="string" value="SPECTRAL" description="The domain to perform normalization. CEPSTRAL or SPECTRAL. [default: CEPSTRAL]"/>
      <Parameter name="USE_LEGACY_MODE" type="bool" value="true" description="For more than 14 dimensions must use false. This parameter is preparing only for compatibility with HARK 2.x or earlier. [default: true]"/>
      <Parameter name="USE_HTK_LIFTER" type="bool" value="false" description="Use HTK liftering vector if true. [default: false]"/>
      <Parameter name="LIFTERING_COEF" type="int" value="22" description="The HTK liftering coefficient used in Cepstral mode. [default: 22]"/>
      <Parameter name="USE_POWER" type="bool" value="true" description="Use power feature if true. [default: false]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_MelFilterBank_1" type="MelFilterBank" x="570" y="100">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="SAMPLING_RATE" type="subnet_param" value="SAMPLING_RATE" description="Sampling rate in Hz.  [default: 16000]"/>
      <Parameter name="CUTOFF" type="int" value="8000" description="Cutoff frequency in Hz. Mel-filterbanks are placed between 0 Hz and CUTOFF Hz. [default: 8000]"/>
      <Parameter name="MIN_FREQUENCY" type="int" value="63" description="Minimum frequency (Hz) [default: 63]"/>
      <Parameter name="MAX_FREQUENCY" type="int" value="8000" description="Maximum frequency (Hz) [default: 8000]"/>
      <Parameter name="FBANK_COUNT" type="subnet_param" value="FBANK_COUNT" description="The number of Mel filter banks. [default: 13]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_PreEmphasis_1" type="PreEmphasis" x="350" y="150">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="window length in sample [default: 512]"/>
      <Parameter name="SAMPLING_RATE" type="subnet_param" value="SAMPLING_RATE" description="Sampling rate in Hz [default: 16000]"/>
      <Parameter name="PREEMCOEF" type="float" value="0.97" description="pre-emphasis coefficient [default: 0.97]"/>
      <Parameter name="INPUT_TYPE" type="string" value="SPECTRUM" description="The domain to perform pre-emphasis [default: WAV]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SpeechRecognitionClient_1" type="SpeechRecognitionClient" x="690" y="310">
      <Parameter name="MFM_ENABLED" type="bool" value="false" description="MFM is enbaled if true. [default: true]"/>
      <Parameter name="HOST" type="string" value="127.0.0.1" description="Hostname or IP of Julius/Julian server. [default: 127.0.0.1]"/>
      <Parameter name="PORT" type="int" value="5530" description="Port number of Julius/Julian server. [default: 5530]"/>
      <Parameter name="SOCKET_ENABLED" type="bool" value="true" description="send data via socket if true. [default: true]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_WhiteNoiseAdder_1" type="WhiteNoiseAdder" x="100" y="150">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="WN_LEVEL" type="float" value="0.1" description="An amplitude of white noise to be added. [default: 0]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_MelFilterBank_1" output="OUTPUT" to="node_MSLSExtraction_1" input="FBANK"/>
    <Link from="node_MSLSExtraction_1" output="OUTPUT" to="node_Delta_1" input="INPUT"/>
    <Link from="node_Delta_1" output="OUTPUT" to="node_FeatureRemover_1" input="INPUT"/>
    <Link from="node_FeatureRemover_1" output="OUTPUT" to="node_SpeechRecognitionClient_1" input="FEATURES"/>
    <Link from="node_FeatureRemover_1" output="OUTPUT" to="node_SpeechRecognitionClient_1" input="MASKS"/>
    <Link from="node_PreEmphasis_1" output="OUTPUT" to="node_MelFilterBank_1" input="INPUT"/>
    <Link from="node_PreEmphasis_1" output="OUTPUT" to="node_MSLSExtraction_1" input="SPECTRUM"/>
    <Link from="node_WhiteNoiseAdder_1" output="OUTPUT" to="node_PreEmphasis_1" input="INPUT"/>
    <NetInput name="SPEC" node="node_WhiteNoiseAdder_1" terminal="INPUT" object_type="Map<int,ObjectRef>" description="Input spectrum. The key is source ID, and the value is a spectrum (Vector<complex<float> >)."/>
    <NetInput name="SOURCES" node="node_SpeechRecognitionClient_1" terminal="SOURCES" object_type="Vector<ObjectRef>" description="Source locations with ID. Each element of the vector is a source location with ID specified by "Source"."/>
    <NetOutput name="ASR-A" node="node_SpeechRecognitionClient_1" terminal="OUTPUT" object_type="Vector<ObjectRef>" description="The same as SOURCES."/>
  </Network>
  <Network type="subnet" name="sub_localization">
    <Node name="node_LocalizeMUSIC_1" type="LocalizeMUSIC" x="100" y="100">
      <Parameter name="MUSIC_ALGORITHM" type="string" value="SEVD" description="Sound Source Localization Algorithm. If SEVD, NOISECM will be ignored"/>
      <Parameter name="TF_CHANNEL_SELECTION" type="object" value="<Vector<int> 0 1 2 3 4 5 6 7>" description="Microphone channels for localization. If vacant, all channels will be used."/>
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="The length of a frame (per channel)."/>
      <Parameter name="SAMPLING_RATE" type="subnet_param" value="SAMPLING_RATE" description="Sampling Rate (Hz)."/>
      <Parameter name="TF_INPUT_TYPE" type="string" value="FILE" description="Load form TF file or Input terminal."/>
      <Parameter name="A_MATRIX" type="string" value="hark_conf/tamago_rectf.zip" description="Filename of a transfer function matrix."/>
      <Parameter name="WINDOW" type="int" value="50" description="The number of frames used for calculating a correlation function."/>
      <Parameter name="WINDOW_TYPE" type="string" value="MIDDLE" description="Window selection to accumulate a correlation function. If PAST, the past WINDOW frames from the current frame are used for the accumulation. If MIDDLE, the current frame will be the middle of the accumulated frames. If FUTURE, the future WINDOW frames from the current frame are used for the accumulation. FUTURE is the default from version 1.0, but this makes a delay since we have to wait for the future information. PAST generates a internal buffers for the accumulation, which realizes no delay for localization."/>
      <Parameter name="PERIOD" type="int" value="50" description="The period in which the source localization is processed."/>
      <Parameter name="NUM_SOURCE" type="int" value="2" description="Number of sources, which should be less than number of channels."/>
      <Parameter name="MIN_DEG" type="int" value="-180" description="source direction (lower)."/>
      <Parameter name="MAX_DEG" type="int" value="180" description="source direction (higher)."/>
      <Parameter name="LOWER_BOUND_FREQUENCY" type="int" value="3000" description="Lower bound of frequency (Hz) used for correlation function calculation."/>
      <Parameter name="UPPER_BOUND_FREQUENCY" type="int" value="6000" description="Upper bound of frequency (Hz) used for correlation function calculation."/>
      <Parameter name="SPECTRUM_WEIGHT_TYPE" type="string" value="A_Characteristic" description="MUSIC spectrum weight for each frequency bin."/>
      <Parameter name="A_CHAR_SCALING" type="float" value="1.0" description="Scaling factor of the A-Weight with respect to frequency"/>
      <Parameter name="MANUAL_WEIGHT_SPLINE" type="object" value="<Matrix<float> <rows 2> <cols 5> <data 0.0 2000.0 4000.0 6000.0 8000.0 1.0 1.0 1.0 1.0 1.0> >" description="MUSIC spectrum weight for each frequency bin. This is a 2 by M matrix. The first row represents the frequency, and the second row represents the weight gain. "M" represents the number of key points for the spectrum weight. The frequency range between M key points will be interpolated by spline manner. The format is "&lt;Matrix&lt;float&gt; &lt;rows 2&gt; &lt;cols 2&gt; &lt;data 1 2 3 4&gt; &gt;"."/>
      <Parameter name="MANUAL_WEIGHT_SQUARE" type="object" value="<Vector<float> 0.0 2000.0 4000.0 6000.0 8000.0>" description="MUSIC spectrum weight for each frequency bin. This is a M order vector. The element represents the frequency points for the square wave. "M" represents the number of key points for the square wave weight. The format is "&lt;Vector&lt;float&gt; 1 2 3 4&gt;"."/>
      <Parameter name="ENABLE_EIGENVALUE_WEIGHT" type="bool" value="false" description="If true, the spatial spectrum is weighted depending on the eigenvalues of a correlation matrix. We do not suggest to use this function with GEVD and GSVD, because the NOISECM changes the eigenvalue drastically. Only useful for SEVD."/>
      <Parameter name="MAXNUM_OUT_PEAKS" type="int" value="-1" description="Maximum number of output peaks. If MAXNUM_OUT_PEAKS = NUM_SOURCE, this is compatible with HARK version 1.0. If MAXNUM_OUT_PEAKS = 0, all local maxima are output. If MAXNUM_OUT_PEAKS &lt; 0, MAXNUM_OUT_PEAKS is set to NUM_SOURCE. If MAXNUM_OUT_PEAKS &gt; 0, number of output peaks is limited to MAXNUM_OUT_PEAKS."/>
      <Parameter name="DEBUG" type="bool" value="true" description="Debug option. If the parameter is true, this node outputs sound localization results to a standard output."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SourceTracker_1" type="SourceTracker" x="320" y="100">
      <Parameter name="THRESH" type="float" value="27.0" description="Power threshold for localization results. A localization result with higher power than THRESH is tracked, otherwise ignored."/>
      <Parameter name="PAUSE_LENGTH" type="float" value="1200" description="Life duration of source in ms. When any localization result for a source is found for more than PAUSE_LENGTH / 10 iterations, the source is terminated. [default: 800]"/>
      <Parameter name="MIN_SRC_INTERVAL" type="float" value="20" description="Source interval threshold in degree. When the angle between a localization result and a source is smaller than MIN_SRC_INTERVAL, the same ID is given to the localization result. [default: 20]"/>
      <Parameter name="MIN_ID" type="int" value="0" description="Minimum ID of source locations. MIN_ID should be greater than 0 or equal."/>
      <Parameter name="DEBUG" type="bool" value="false" description="Output debug information if true [default: false]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SourceIntervalExtender_1" type="SourceIntervalExtender" x="550" y="100">
      <Parameter name="PREROLL_LENGTH" type="int" value="80" description="Preroll length in frame. [default: 50]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_plotQuickSourceKivy_1" type="plotQuickSourceKivy" x="840" y="100">
    </Node>
    <Link from="node_LocalizeMUSIC_1" output="OUTPUT" to="node_SourceTracker_1" input="INPUT"/>
    <Link from="node_SourceTracker_1" output="OUTPUT" to="node_SourceIntervalExtender_1" input="SOURCES"/>
    <Link from="node_SourceIntervalExtender_1" output="OUTPUT" to="node_plotQuickSourceKivy_1" input="SOURCES"/>
    <NetInput name="WAV" node="node_LocalizeMUSIC_1" terminal="INPUT" object_type="Matrix<complex<float> >" description="Multi-channel audio signals. In this matrix, a row is a channel, and a column is a sample."/>
    <NetOutput name="OUTPUT" node="node_plotQuickSourceKivy_1" terminal="OUTPUT" object_type="any" description=""/>
  </Network>
</Document>

 

Practice2-5: オンライン音源分離・音源認識システム開発

TAMAGO実機を使ったオンライン定位・分離・認識

マイクアレイTAMAGOの接続

  • TAMAGOを未接続な場合
    • USBケーブルを接続すると右のようなウィンドウが出るので「仮想マシンに接続」を選択し,OKをクリック
    • TAMAGOをUSBに接続時のメッセージ
  • PCにTAMAGOを接続済だが,virtual machineに未接続の場合
  • TAMAGOを仮想マシンに接続
    • VM Playerに接続
    • 取外し可能デバイス > TAMAGO-XX > 接続 (ホストから切断)
  • すでに virtual machine に接続されている場合,何もしなくてよい

デバイスの確認

仮想マシンでデバイスの確認
  1. 端末を開き,"arecord -l" を実行
  2. $ arecord -l
  3. 「TAMAGOXX」があることを確認.リストにない場合は認識されていないので,再度のTAMAGOの接続方法を実行
  4. TAMAGOのデバイス名 を確認 (後ほど使用) 例:TAMAGO03

Practice2-5: practice_online.n の確認

  • HARK_tutorial_2024/practice2/data/ にある practice2_online.n をロード
  • 本ネットワークとオフライン処理用ネットワークとの違い
  • オフライン処理用ネットワークとの違い
  • MAIN_LOOPタブの AudioStreamFromMic のプロパティを開く
  • DEVICEplughw:"TAMAGOのデバイス名" を入力. 例:plughw:TAMAGO03
  • MAIN_LOOPのAudioStreamFromMicのpropertyの設定
  • DEVICE の値を変更した場合のみ
    • ネットワークのSaveを行い,次に,Fileからpractice2_online.nをダウンロード
    • ダウンロードフォルダにあるpractice2_online.nHARK_totorial_2024/practice2/data/ に移動
    ネットワークファイルの更新

Practice2-5 オンライン音源分離・音声認識の実行

  1. KaldiDecoderを起動(以前に起動したKaldiDecoderは終了させておく
  2. $ cd ~/HARK_tutorial_2024/practice2/data
    $ sh 1_run_ASR.sh
  3. 2_run_HARK_online.sh を実行 (新たに端末を開く場合,指定ディレクトリへの移動を忘れずに)
  4. $ sh 2_run_HARK_online.sh
  5. 定位結果を表示するウィンドウがポップアップ
  6. 認識結果が表示されれば成功
  7. TAMAGOの方位角
  8. 複数話者に対するオンライン分離・認識が可能です

Practice2-2 ネットワークの停止

  • HARK (MUSIC処理結果)のウィンドウをアクティブにし,「ctrl+C」で終了させる.(HARKが終了すると,グラフも終了される)
  • HARKの停止
  • Kaldi (KaldiDecorder),Kaldi output (認識結果)は ✕ ボタンで消す
  • 2_sort.sh を実行 (新たに端末を開く場合,指定ディレクトリへの移動を忘れずに)
  • $ sh 2_sort.sh
    Practice2-2 の停止
  • 音源分離・認識実験で作成されるファイル
    • rec_files/ に収録音
    • sep_files/ に分離音
    • kaldi_out_progress.txt に逐次認識結果
    • kaldi_out.txt に最終認識結果

Practice2-5 分離音の確認

  1. 分離前の音を聞く: 5_run_PlayAudio_online.sh を実行
  2. $ sh 5_run_PlayAudio_online.sh
  3. 分離後の音を聞く: 6_run_PlayAudio_online.sh を実行
  4. $ sh 6_run_PlayAudio_online.sh [方位角最小値(°)] [方位角最大値(°)]
    (方位角が最小値~最大値の範囲にある音源の音を聞くことができる)
  5. 認識結果 (kaldi_out.txtを確認
  6. $ cat kaldi_out.txt
    Kaldiの認識結果

ネットワークファイル practice2_online.n の中身

#!/usr/bin/env batchflow
<?xml version="1.0"?>
<Document>
  <Network type="subnet" name="MAIN">
    <Node name="node_MAIN_LOOP_1" type="MAIN_LOOP" x="100" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The frame length of each channel (in samples) [default: 512]."/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
    </Node>
    <NetOutput name="ASR-A" node="node_MAIN_LOOP_1" terminal="ASR-A" object_type="any" description="Dynamic"/>
    <NetOutput name="OUTPUT" node="node_MAIN_LOOP_1" terminal="OUTPUT" object_type="any" description="Dynamic"/>
    <NetOutput name="OUTPUT_1" node="node_MAIN_LOOP_1" terminal="OUTPUT_1" object_type="any" description="Dynamic"/>
  </Network>
  <Network type="iterator" name="MAIN_LOOP">
    <Node name="node_MultiFFT_1" type="MultiFFT" x="630" y="100">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="FFT length in sample. [default: 512]"/>
      <Parameter name="WINDOW" type="string" value="CONJ" description="A window function for FFT. WINDOW should be CONJ, HAMMING, RECTANGLE, or HANNING. [default: CONJ]"/>
      <Parameter name="WINDOW_LENGTH" type="subnet_param" value="LENGTH" description="Window length of the window function. [default: 512]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_sub_separation_1" type="sub_separation" x="940" y="100">
    </Node>
    <Node name="node_sub_recognition_1" type="sub_recognition" x="1090" y="300">
      <Parameter name="FBANK_COUNT" type="int" value="40" description="The size of the input feature vector."/>
      <Parameter name="LENGTH" type="int" value="512" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate in Hz.  [default: 16000]"/>
    </Node>
    <Node name="node_sub_localization_1" type="sub_localization" x="760" y="320">
      <Parameter name="LENGTH" type="int" value="512" description="The length of a frame (per channel)."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling Rate (Hz)."/>
    </Node>
    <Node name="node_MultiGain_1" type="MultiGain" x="430" y="100">
      <Parameter name="GAIN" type="float" value="0.0244" description="Gain factor."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_AudioStreamFromMic_1" type="AudioStreamFromMic" x="100" y="100">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="The frame length of each channel (in samples) [default: 512]."/>
      <Parameter name="ADVANCE" type="subnet_param" value="ADVANCE" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="CHANNEL_COUNT" type="int" value="8" description="The number of channels."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (Hz) [default: 16000]."/>
      <Parameter name="DEVICETYPE" type="string" value="ALSA" description="Device type [default: WS]."/>
      <Parameter name="GAIN" type="string" value="0dB" description="capture gain (dB)  [default: 0dB]."/>
      <Parameter name="DEVICE" type="string" value="plughw:TAMAGO03" description="Device name or IP address [default: 127.0.0.1]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SaveWavePCM_1" type="SaveWavePCM" x="370" y="280">
      <Parameter name="BASENAME" type="string" value="rec_files/rec_" description="Basename of files. [default: sep_]"/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (in samples)[default: 16000]."/>
      <Parameter name="BITS" type="string" value="int24" description="Bit format of samples. int16 and int24  bits are supported."/>
      <Parameter name="INPUT_BITS" type="string" value="auto" description="Bit format of input wav file."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_MultiGain_2" type="MultiGain" x="190" y="280">
      <Parameter name="GAIN" type="float" value="100" description="Gain factor."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_MultiFFT_1" output="OUTPUT" to="node_sub_localization_1" input="WAV"/>
    <Link from="node_MultiFFT_1" output="OUTPUT" to="node_sub_separation_1" input="SPEC"/>
    <Link from="node_sub_separation_1" output="POSTFLT_SPEC" to="node_sub_recognition_1" input="SPEC"/>
    <Link from="node_MultiGain_1" output="OUTPUT" to="node_MultiFFT_1" input="INPUT"/>
    <Link from="node_sub_localization_1" output="OUTPUT" to="node_sub_separation_1" input="SRC_INFO"/>
    <Link from="node_sub_localization_1" output="OUTPUT" to="node_sub_recognition_1" input="SOURCES"/>
    <Link from="node_AudioStreamFromMic_1" output="AUDIO" to="node_MultiGain_1" input="INPUT"/>
    <Link from="node_AudioStreamFromMic_1" output="AUDIO" to="node_MultiGain_2" input="INPUT"/>
    <Link from="node_MultiGain_2" output="OUTPUT" to="node_SaveWavePCM_1" input="INPUT"/>
    <NetOutput name="ASR-A" node="node_sub_recognition_1" terminal="ASR-A" object_type="any" description="Dynamic"/>
    <NetOutput name="OUTPUT" node="node_sub_separation_1" terminal="OUTPUT" object_type="any" description="Dynamic"/>
    <NetOutput name="OUTPUT_1" node="node_SaveWavePCM_1" terminal="OUTPUT" object_type="Map&lt;int,ObjectRef&gt;" description="The same as input."/>
    <NetCondition name="CONDITION" node="node_AudioStreamFromMic_1" terminal="NOT_EOF"/>
  </Network>
  <Network type="subnet" name="sub_separation">
    <Node name="node_PowerCalcForMap_1" type="PowerCalcForMap" x="100" y="330">
      <Parameter name="POWER_TYPE" type="string" value="POW" description="Measure for computing the POW or MAG (i.e. power or magnitude) of the complex spectrum [default: POW]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_CalcSpecAddPower_1" type="CalcSpecAddPower" x="760" y="530">
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_CalcSpecSubGain_1" type="CalcSpecSubGain" x="510" y="250">
      <Parameter name="ALPHA" type="float" value="1.0" description="Overestimation factor."/>
      <Parameter name="BETA" type="float" value="0.0" description="Spectral floor."/>
      <Parameter name="SS_METHOD" type="int" value="2" description="1: Magnitude Spectral Subtraction, 2: Power SS"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_EstimateLeak_1" type="EstimateLeak" x="360" y="600">
      <Parameter name="LEAK_FACTOR" type="float" value="0.25" description="Leak factor [default:0.25]"/>
      <Parameter name="OVER_CANCEL_FACTOR" type="float" value="1" description="Over cancel value [default:1]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_HRLE_1" type="HRLE" x="440" y="470">
      <Parameter name="LX" type="float" value="0.10" description="Lx value of estimation, e.g. Lx=0 -&gt; Minimum (MCRA), Lx=0.5 -&gt; Median , Lx=1.0 -&gt; Maximum [default:0.85]"/>
      <Parameter name="TIME_CONST_METHOD" type="string" value="LEGACY" description="Time constant value definition, "LEGACY" uses time constant value for HARK 2.0.0,"MILLISECOND" uses time constant value in frames. [default: LEGACY]"/>
      <Parameter name="TIME_CONSTANT" type="float" value="16000" description="Time constant for exponential decay window in samples [default:]"/>
      <Parameter name="DECAY_FACTOR" type="int" value="1000" description="Time constant for exponential decay window in millisecond [default:]"/>
      <Parameter name="ADVANCE" type="int" value="160" description="The length in sample between a frame and a previous frame. [default: 160]"/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (Hz) [default: 16000]."/>
      <Parameter name="NUM_BIN" type="float" value="2000" description="Number of histogram bins [default:1000]"/>
      <Parameter name="MIN_LEVEL" type="float" value="-200" description="Minimum level of histogram bin in dB [default:-100]"/>
      <Parameter name="STEP_LEVEL" type="float" value="0.2" description="Step level of histogram bin (Width of each histogram bin) in dB [default:0.2]"/>
      <Parameter name="DEBUG" type="bool" value="false" description="Prints the histogram for each 100 iterations."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SpectralGainFilter_1" type="SpectralGainFilter" x="1000" y="100">
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_GHDSS_1" type="GHDSS" x="160" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The frame length of each channel (in samples) [default: 512]."/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (Hz) [default: 16000]."/>
      <Parameter name="LOWER_BOUND_FREQUENCY" type="int" value="0" description="Lower bound of frequency (Hz). [default: 0]"/>
      <Parameter name="UPPER_BOUND_FREQUENCY" type="int" value="8000" description="Upper bound of frequency (Hz). [default: 8000]"/>
      <Parameter name="TF_INPUT_TYPE" type="string" value="FILE" description="Load form TF file or Input terminal."/>
      <Parameter name="TF_CONJ_FILENAME" type="string" value="hark_conf/tamago_rectf.zip" description="Filename of a pre-measured transfer function for separation."/>
      <Parameter name="INITW_FILENAME" type="string" value="" description="Filename of an initial separation matrix. If specified, a matrix in INITW_FILENAME is used as an initial separation matrix. Otherwise, initial separation matrix is estimated from the geometrical relationship or pre-measured TF according to TF_CONJ."/>
      <Parameter name="SS_METHOD" type="string" value="ADAPTIVE" description="The calculation method for SS step size parameter corresponding to the blind separation part. "FIX" uses a fixed step size,"LC_MYU" uses the same value as LC_MYU, and "ADAPTIVE" adaptively estimates an optimal step size. [default: ADAPTIVE]"/>
      <Parameter name="SS_SCAL" type="float" value="1.0" description="Scaling factor for SS step size. [default: 1.0]"/>
      <Parameter name="SS_MYU" type="float" value="0.001" description="SS step size value. [default 0.001]"/>
      <Parameter name="NOISE_FLOOR" type="float" value="0.0" description="Noise floor value. [default 0.0]"/>
      <Parameter name="LC_CONST" type="string" value="DIAG" description="The calculation method for geometric constraints. "FULL" uses all elements of a matrix, and "DIAG" only uses diagonal parts. [default: FULL]"/>
      <Parameter name="LC_METHOD" type="string" value="ADAPTIVE" description="The calculation method for LC step size corresponding to geometric constraints. "FIX" uses a fixed value, and "Adaptive" adaptively estimates an optimal step size. [default: ADAPTIVE]"/>
      <Parameter name="LC_MYU" type="float" value="0.001" description="LC step size value. [default 0.001]"/>
      <Parameter name="UPDATE_METHOD_TF_CONJ" type="string" value="POS" description="Switching method of TF_CONJ data. [default: POS]"/>
      <Parameter name="UPDATE_METHOD_W" type="string" value="ID" description="Switching method of separation matrix, W. [default: ID]"/>
      <Parameter name="UPDATE_ACCEPT_DISTANCE" type="float" value="300" description="Distance allowance to switch separation matrix in [mm]. available when when UPDATE_METHOD_W is POS or ID_POS. [default: 300.0]"/>
      <Parameter name="EXPORT_W" type="bool" value="false" description="Separation matrix W is exported if true. [default: false]"/>
      <Parameter name="EXPORT_W_FILENAME" type="string" value="" description="The filename to export W."/>
      <Parameter name="UPDATE" type="string" value="STEP" description="The update method of separation matrix. "STEP" updates W sequentially, i.e., based on SS and then on LC cost. "TOTAL" updates W based on an integrated value of SS and LC cost [default: STEP]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SaveWavePCM_2" type="SaveWavePCM" x="1210" y="430">
      <Parameter name="BASENAME" type="string" value="sep_files/sep_" description="Basename of files. [default: sep_]"/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (in samples)[default: 16000]."/>
      <Parameter name="BITS" type="string" value="int24" description="Bit format of samples. int16 and int24  bits are supported."/>
      <Parameter name="INPUT_BITS" type="string" value="auto" description="Bit format of input wav file."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_Synthesize_1" type="Synthesize" x="1220" y="280">
      <Parameter name="LENGTH" type="int" value="512" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="ADVANCE" type="int" value="160" description="The length in sample between a frame and a previous frame. [default: 160]"/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (Hz) [default: 16000]."/>
      <Parameter name="MIN_FREQUENCY" type="int" value="125" description="Minimum frequency (Hz) [default: 125]"/>
      <Parameter name="MAX_FREQUENCY" type="int" value="7900" description="Maximum frequency (Hz) [default: 7900]"/>
      <Parameter name="WINDOW" type="string" value="HAMMING" description="A window function for overlap-add. WINDOW should be CONJ, HAMMING, RECTANGLE, or HANNING. [default: HAMMING]"/>
      <Parameter name="OUTPUT_GAIN" type="float" value="1.0" description="Output gain factor. [default: 1.0]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_HRLE_1" output="NOISE_SPEC" to="node_CalcSpecAddPower_1" input="INPUT_POWER_SPEC1"/>
    <Link from="node_EstimateLeak_1" output="LEAK_POWER_SPEC" to="node_CalcSpecAddPower_1" input="INPUT_POWER_SPEC2"/>
    <Link from="node_PowerCalcForMap_1" output="OUTPUT" to="node_CalcSpecSubGain_1" input="INPUT_POWER_SPEC"/>
    <Link from="node_CalcSpecAddPower_1" output="OUTPUT_POWER_SPEC" to="node_CalcSpecSubGain_1" input="NOISE_SPEC"/>
    <Link from="node_PowerCalcForMap_1" output="OUTPUT" to="node_EstimateLeak_1" input="INPUT_POWER_SPEC"/>
    <Link from="node_PowerCalcForMap_1" output="OUTPUT" to="node_HRLE_1" input="INPUT_SPEC"/>
    <Link from="node_CalcSpecSubGain_1" output="VOICE_PROB" to="node_SpectralGainFilter_1" input="VOICE_PROB"/>
    <Link from="node_CalcSpecSubGain_1" output="GAIN" to="node_SpectralGainFilter_1" input="GAIN"/>
    <Link from="node_GHDSS_1" output="OUTPUT" to="node_PowerCalcForMap_1" input="INPUT"/>
    <Link from="node_GHDSS_1" output="OUTPUT" to="node_SpectralGainFilter_1" input="INPUT_SPEC"/>
    <Link from="node_Synthesize_1" output="OUTPUT" to="node_SaveWavePCM_2" input="INPUT"/>
    <Link from="node_SpectralGainFilter_1" output="OUTPUT_SPEC" to="node_Synthesize_1" input="INPUT"/>
    <NetInput name="SPEC" node="node_GHDSS_1" terminal="INPUT_FRAMES" object_type="Matrix<complex<float> >" description="Input multi-channel spectrum. A row is a channel, and a column is a spectrum for the corresponding channel."/>
    <NetInput name="SRC_INFO" node="node_GHDSS_1" terminal="INPUT_SOURCES" object_type="Vector<ObjectRef>" description="Source locations with ID. Each element of the vector is a source location with ID specified by "Source"."/>
    <NetOutput name="POSTFLT_SPEC" node="node_SpectralGainFilter_1" terminal="OUTPUT_SPEC" object_type="Map<int,ObjectRef>" description="Estimated voice spectrum(Vector<complex<float> >) with a key(source ID)."/>
    <NetOutput name="OUTPUT" node="node_SaveWavePCM_2" terminal="OUTPUT" object_type="Map<int,ObjectRef>" description="The same as input."/>
  </Network>
  <Network type="subnet" name="sub_recognition">
    <Node name="node_Delta_1" type="Delta" x="100" y="310">
      <Parameter name="FBANK_COUNT" type="int" value="41" description="The size of the input feature vector."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_FeatureRemover_1" type="FeatureRemover" x="370" y="310">
      <Parameter name="SELECTOR" type="object" value="<Vector<int> 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81>" description="Component indices in a feature vector to remove. E.g. &lt;Vector&lt;int&gt; 13&gt; to remove 14th comopnent (The index start with 0)."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_MSLSExtraction_1" type="MSLSExtraction" x="800" y="130">
      <Parameter name="FBANK_COUNT" type="subnet_param" value="FBANK_COUNT" description="Size of the static part of MSLS feature vector. [default: 13]"/>
      <Parameter name="NORMALIZATION_MODE" type="string" value="SPECTRAL" description="The domain to perform normalization. CEPSTRAL or SPECTRAL. [default: CEPSTRAL]"/>
      <Parameter name="USE_LEGACY_MODE" type="bool" value="true" description="For more than 14 dimensions must use false. This parameter is preparing only for compatibility with HARK 2.x or earlier. [default: true]"/>
      <Parameter name="USE_HTK_LIFTER" type="bool" value="false" description="Use HTK liftering vector if true. [default: false]"/>
      <Parameter name="LIFTERING_COEF" type="int" value="22" description="The HTK liftering coefficient used in Cepstral mode. [default: 22]"/>
      <Parameter name="USE_POWER" type="bool" value="true" description="Use power feature if true. [default: false]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_MelFilterBank_1" type="MelFilterBank" x="570" y="100">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="SAMPLING_RATE" type="subnet_param" value="SAMPLING_RATE" description="Sampling rate in Hz.  [default: 16000]"/>
      <Parameter name="CUTOFF" type="int" value="8000" description="Cutoff frequency in Hz. Mel-filterbanks are placed between 0 Hz and CUTOFF Hz. [default: 8000]"/>
      <Parameter name="MIN_FREQUENCY" type="int" value="63" description="Minimum frequency (Hz) [default: 63]"/>
      <Parameter name="MAX_FREQUENCY" type="int" value="8000" description="Maximum frequency (Hz) [default: 8000]"/>
      <Parameter name="FBANK_COUNT" type="subnet_param" value="FBANK_COUNT" description="The number of Mel filter banks. [default: 13]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_PreEmphasis_1" type="PreEmphasis" x="350" y="150">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="window length in sample [default: 512]"/>
      <Parameter name="SAMPLING_RATE" type="subnet_param" value="SAMPLING_RATE" description="Sampling rate in Hz [default: 16000]"/>
      <Parameter name="PREEMCOEF" type="float" value="0.97" description="pre-emphasis coefficient [default: 0.97]"/>
      <Parameter name="INPUT_TYPE" type="string" value="SPECTRUM" description="The domain to perform pre-emphasis [default: WAV]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SpeechRecognitionClient_1" type="SpeechRecognitionClient" x="690" y="310">
      <Parameter name="MFM_ENABLED" type="bool" value="false" description="MFM is enbaled if true. [default: true]"/>
      <Parameter name="HOST" type="string" value="127.0.0.1" description="Hostname or IP of Julius/Julian server. [default: 127.0.0.1]"/>
      <Parameter name="PORT" type="int" value="5530" description="Port number of Julius/Julian server. [default: 5530]"/>
      <Parameter name="SOCKET_ENABLED" type="bool" value="true" description="send data via socket if true. [default: true]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_WhiteNoiseAdder_1" type="WhiteNoiseAdder" x="100" y="150">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="WN_LEVEL" type="float" value="0.1" description="An amplitude of white noise to be added. [default: 0]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_MelFilterBank_1" output="OUTPUT" to="node_MSLSExtraction_1" input="FBANK"/>
    <Link from="node_MSLSExtraction_1" output="OUTPUT" to="node_Delta_1" input="INPUT"/>
    <Link from="node_Delta_1" output="OUTPUT" to="node_FeatureRemover_1" input="INPUT"/>
    <Link from="node_FeatureRemover_1" output="OUTPUT" to="node_SpeechRecognitionClient_1" input="FEATURES"/>
    <Link from="node_FeatureRemover_1" output="OUTPUT" to="node_SpeechRecognitionClient_1" input="MASKS"/>
    <Link from="node_PreEmphasis_1" output="OUTPUT" to="node_MelFilterBank_1" input="INPUT"/>
    <Link from="node_PreEmphasis_1" output="OUTPUT" to="node_MSLSExtraction_1" input="SPECTRUM"/>
    <Link from="node_WhiteNoiseAdder_1" output="OUTPUT" to="node_PreEmphasis_1" input="INPUT"/>
    <NetInput name="SPEC" node="node_WhiteNoiseAdder_1" terminal="INPUT" object_type="Map<int,ObjectRef>" description="Input spectrum. The key is source ID, and the value is a spectrum (Vector<complex<float> >)."/>
    <NetInput name="SOURCES" node="node_SpeechRecognitionClient_1" terminal="SOURCES" object_type="Vector<ObjectRef>" description="Source locations with ID. Each element of the vector is a source location with ID specified by "Source"."/>
    <NetOutput name="ASR-A" node="node_SpeechRecognitionClient_1" terminal="OUTPUT" object_type="Vector<ObjectRef>" description="The same as SOURCES."/>
  </Network>
  <Network type="subnet" name="sub_localization">
    <Node name="node_LocalizeMUSIC_1" type="LocalizeMUSIC" x="100" y="100">
      <Parameter name="MUSIC_ALGORITHM" type="string" value="SEVD" description="Sound Source Localization Algorithm. If SEVD, NOISECM will be ignored"/>
      <Parameter name="TF_CHANNEL_SELECTION" type="object" value="<Vector<int> 0 1 2 3 4 5 6 7>" description="Microphone channels for localization. If vacant, all channels will be used."/>
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="The length of a frame (per channel)."/>
      <Parameter name="SAMPLING_RATE" type="subnet_param" value="SAMPLING_RATE" description="Sampling Rate (Hz)."/>
      <Parameter name="TF_INPUT_TYPE" type="string" value="FILE" description="Load form TF file or Input terminal."/>
      <Parameter name="A_MATRIX" type="string" value="hark_conf/tamago_rectf.zip" description="Filename of a transfer function matrix."/>
      <Parameter name="WINDOW" type="int" value="50" description="The number of frames used for calculating a correlation function."/>
      <Parameter name="WINDOW_TYPE" type="string" value="MIDDLE" description="Window selection to accumulate a correlation function. If PAST, the past WINDOW frames from the current frame are used for the accumulation. If MIDDLE, the current frame will be the middle of the accumulated frames. If FUTURE, the future WINDOW frames from the current frame are used for the accumulation. FUTURE is the default from version 1.0, but this makes a delay since we have to wait for the future information. PAST generates a internal buffers for the accumulation, which realizes no delay for localization."/>
      <Parameter name="PERIOD" type="int" value="50" description="The period in which the source localization is processed."/>
      <Parameter name="NUM_SOURCE" type="int" value="2" description="Number of sources, which should be less than number of channels."/>
      <Parameter name="MIN_DEG" type="int" value="-180" description="source direction (lower)."/>
      <Parameter name="MAX_DEG" type="int" value="180" description="source direction (higher)."/>
      <Parameter name="LOWER_BOUND_FREQUENCY" type="int" value="3000" description="Lower bound of frequency (Hz) used for correlation function calculation."/>
      <Parameter name="UPPER_BOUND_FREQUENCY" type="int" value="6000" description="Upper bound of frequency (Hz) used for correlation function calculation."/>
      <Parameter name="SPECTRUM_WEIGHT_TYPE" type="string" value="A_Characteristic" description="MUSIC spectrum weight for each frequency bin."/>
      <Parameter name="A_CHAR_SCALING" type="float" value="1.0" description="Scaling factor of the A-Weight with respect to frequency"/>
      <Parameter name="MANUAL_WEIGHT_SPLINE" type="object" value="<Matrix<float> <rows 2> <cols 5> <data 0.0 2000.0 4000.0 6000.0 8000.0 1.0 1.0 1.0 1.0 1.0> >" description="MUSIC spectrum weight for each frequency bin. This is a 2 by M matrix. The first row represents the frequency, and the second row represents the weight gain. "M" represents the number of key points for the spectrum weight. The frequency range between M key points will be interpolated by spline manner. The format is "&lt;Matrix&lt;float&gt; &lt;rows 2&gt; &lt;cols 2&gt; &lt;data 1 2 3 4&gt; &gt;"."/>
      <Parameter name="MANUAL_WEIGHT_SQUARE" type="object" value="<Vector<float> 0.0 2000.0 4000.0 6000.0 8000.0>" description="MUSIC spectrum weight for each frequency bin. This is a M order vector. The element represents the frequency points for the square wave. "M" represents the number of key points for the square wave weight. The format is "&lt;Vector&lt;float&gt; 1 2 3 4&gt;"."/>
      <Parameter name="ENABLE_EIGENVALUE_WEIGHT" type="bool" value="false" description="If true, the spatial spectrum is weighted depending on the eigenvalues of a correlation matrix. We do not suggest to use this function with GEVD and GSVD, because the NOISECM changes the eigenvalue drastically. Only useful for SEVD."/>
      <Parameter name="MAXNUM_OUT_PEAKS" type="int" value="-1" description="Maximum number of output peaks. If MAXNUM_OUT_PEAKS = NUM_SOURCE, this is compatible with HARK version 1.0. If MAXNUM_OUT_PEAKS = 0, all local maxima are output. If MAXNUM_OUT_PEAKS &lt; 0, MAXNUM_OUT_PEAKS is set to NUM_SOURCE. If MAXNUM_OUT_PEAKS &gt; 0, number of output peaks is limited to MAXNUM_OUT_PEAKS."/>
      <Parameter name="DEBUG" type="bool" value="true" description="Debug option. If the parameter is true, this node outputs sound localization results to a standard output."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SourceTracker_1" type="SourceTracker" x="320" y="100">
      <Parameter name="THRESH" type="float" value="27.0" description="Power threshold for localization results. A localization result with higher power than THRESH is tracked, otherwise ignored."/>
      <Parameter name="PAUSE_LENGTH" type="float" value="1200" description="Life duration of source in ms. When any localization result for a source is found for more than PAUSE_LENGTH / 10 iterations, the source is terminated. [default: 800]"/>
      <Parameter name="MIN_SRC_INTERVAL" type="float" value="20" description="Source interval threshold in degree. When the angle between a localization result and a source is smaller than MIN_SRC_INTERVAL, the same ID is given to the localization result. [default: 20]"/>
      <Parameter name="MIN_ID" type="int" value="0" description="Minimum ID of source locations. MIN_ID should be greater than 0 or equal."/>
      <Parameter name="DEBUG" type="bool" value="false" description="Output debug information if true [default: false]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SourceIntervalExtender_1" type="SourceIntervalExtender" x="550" y="100">
      <Parameter name="PREROLL_LENGTH" type="int" value="80" description="Preroll length in frame. [default: 50]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_plotQuickSourceKivy_1" type="plotQuickSourceKivy" x="840" y="100">
    </Node>
    <Link from="node_LocalizeMUSIC_1" output="OUTPUT" to="node_SourceTracker_1" input="INPUT"/>
    <Link from="node_SourceTracker_1" output="OUTPUT" to="node_SourceIntervalExtender_1" input="SOURCES"/>
    <Link from="node_SourceIntervalExtender_1" output="OUTPUT" to="node_plotQuickSourceKivy_1" input="SOURCES"/>
    <NetInput name="WAV" node="node_LocalizeMUSIC_1" terminal="INPUT" object_type="Matrix<complex<float> >" description="Multi-channel audio signals. In this matrix, a row is a channel, and a column is a sample."/>
    <NetOutput name="OUTPUT" node="node_plotQuickSourceKivy_1" terminal="OUTPUT" object_type="any" description=""/>
  </Network>
</Document>

 

Practice2のまとめ

  • 実習の目的 定位・分離・認識機能の実現方法を体験
  • できるようになったこと
    • HARK主要処理の理解
    • 2話者同時認識システム
      • 2-1. HARKの音源分離と音声認識の概要について
      • 2-2. サンプルネットワークファイルの構成について
      • 2-3. 事前収録音で処理実行,結果の確認
      • 2-4. 認識率の算出
      • 2-5. オンライン音源分離・音声認識の体験
  • 次に学ぶこと

次は Practice3 に挑戦