2.7 Conclusion

The authors have described our idea for studies on robot audition based on the creation of “the robot that hears a sound with its own ears” and expectation for future development. Since the robot audition study started from almost nothing, we have attempted to promote not only our own study but also the studies concerned We garnered the cooperation of the academia of Asano (AIST), Kobayashi (Waseda University) and Saruwatari et al.(NAIST), enterprises developing robot audition such as NEC, Hitachi, Toshiba and HRI-JP, and furthermore of overseas research institutes such as University of Sherbrooke in Canada, KIST in Korea, LAAS in France and HRI-EU in Germany and have organized the sessions for robot audition in IEEE/RSJ IROS for the past six years and the special sessions in academic lectures of the Robotics Society of Japan for the past five years. Furthermore, in 2009, we organized the robot audition special session in ICASSP-2009, the International Conference on Acoustics, Speech and Signal Processing organized by IEEE Signal Processing Society. Raising such a study community, researchers have been increasing slowly in the world and above all the high level of Japan’s robot audition studies is outstanding. We expect that Prince Shotoku robot will support hearing-impaired persons and elderly people and furthermore contribute to the construction of a peaceful society through more and more future development of our study.

Learning to listen to what others say at sixty years old, from “The Analects of Confucius / Government”
It is said that a person learns to listen to what others say at age sixty. However, the sensitivity of auditory organs for high frequencies drops through overwork or age, and the person loses the ability to hear conversations so he/she cannot depend on their ears even if they wish to.

[1]: 中臺, 光永, 奥乃 (編) ロボット聴覚特集, 日本ロボット学会誌, Vol.28, No.1 (2010年1月).
[2]: C. Côté, et al.Code Reusability Tools for Programming Mobile Robots, IEEE/RSJ IROS 2004, pp.1820–1825.
[3]: J.-M. Valin, F. Michaud, B. Hadjou, J. Rouat Localization of simultaneous moving sound sources for mobile robot using a frequency-domain steered beamformer approach. IEEE ICRA 2004, pp.1033–1038.
[4]: S. Yamamoto, J.-M. Valin, K. Nakadai, T. Ogata, and H. G. Okuno. Enhanced robot speech recognition based on microphone array source separation and missing feature theory. IEEE ICRA 2005, pp.1427–1482.
[5]: 奥乃, 中臺ロボット聴覚オープンソフトウエア HARK, 日本ロボット学会誌, Vol.28, No.1 (2010年1月) 6–9, 日本ロボット学会.
[6]: K. Nakadai, T. Takahasi, H.G. Okuno, H. Nakajima, Y. Hasegawa, H. Tsujino Design and Implementation of Robot Audition System "HARK", Advanced Robotics, Vol.24 (2010) 739-761, VSP and RSJ.
[7]: K. Nakamura, K. Nakadai, F. Asano, Y. Hasegawa, and H. Tsujino, “Intelligent Sound Source Localization for Dynamic Environments”, in Proc. of IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems (IROS 2009), pp. 664–669, 2009.
[8]: H. Nakajima, K. Nakadai, Y. Hasegawa, H. Tsujino Blind Source Spearation With Parameter-Free Adaptive Step-Size Method for Robot Audition, IEEE Transactions on Audio, Speech, and Language Processing, Vol.18, No.6 (Aug. 2010) 1467–1485, IEEE.
[9]: D. Rosenthal, and H.G. Okuno (Eds.) Computational Auditory Scene Analysis, Lawrence Erlbaum Associates, 1998.
[10]: Bregman, A.S. Auditory Scene Analysis – the Perceptual Organization of Sound, MIT Press (1990).
[11]: H.G. Okuno, T. Nakatani, T. Kawabata Interfacing Sound Stream Segregation to Automatic Speech Recognition – Preliminary Results on Listening to Several Sounds Simultaneously, Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-1996), 1082–1089, AAAI, Portland, Aug. 1996.
[12]: 人工知能学会AIチャレンジ研究会資料. Webより入手可能 http://winnie.kuis.kyoto-u.ac.jp/AI-Challenge/
[13]: 西村義隆, 篠崎隆宏, 岩野公司, 古井貞煕周波数帯域ごとの重みつき尤度を用いた音声認識の検討, 日本音響学会2004年春季研究発表会講演論文集, 日本音響学会, Vol.1, pp.117–118, 2004.
[14]: Nakadai, K., Lourens, T., Okuno, H.G., and Kitano, H. Active Audition for Humanoid. In Proc. of AAAI-2000, pp.832–839, AAAI, Jul. 2000.
[15]: Nakadai, K., Hidai, T., Mizoguchi, H., Okuno, H.G., and Kitano, H. Real-Time Auditory and Visual Multiple-Object Tracking for Robots, In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI-2001), pp.1425–1432, IJCAI, 2001.
[16]: Nakadai, K., Matasuura, D., Okuno, H.G., and Tsujino, H. Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots, Speech Communication, Vol.44, No.1–4 (2004) pp.97–112, Elsevier.
[17]: Nakadai, K., Yamamoto, S., Okuno, H.G., Nakajima, H., Hasegawa, Y., Tsujino H. A Robot Referee for Rock-Paper-Scissors Sound Games, Proceedings of IEEE-RAS International Conference on Robotics and Automation (ICRA-2008), pp.3469–3474, IEEE, May 20, 2008. doi10.1109/ROBOT.2008.4543741
[18]: Kubota, Y., Yoshida, M., Komatani, K., Ogata, T., Okuno, H.G. Design and Implementation of 3D Auditory Scene Visualizer towards Auditory Awareness with Face Tracking, Proceedings of IEEE International Symposium on Multimedia (ISM2008), pp.468–476, Berkeley, Dec. 16. 2008. doi10.1109/ISM.2008.107
[19]: Kubota, Y., Shiramatsu, S., Yoshida, M., Komatani, K., Ogata, T., Okuno, H.G. 3D Auditory Scene Visualizer With Face Tracking Design and Implementation For Auditory Awareness Compensation, Proceedings of 2nd International Symposium on Universal Communication (ISUC2008), pp.42–49, IEEE, Osaka, Dec. 15. 2008. doi10.1109/ISUC.2008.59
[20]: Kashino, M., and Hirahara, T. One, two, many – Judging the number of concurrent talkers, Journal of Acoustic Society of America, Vol.99, No.4 (1996), Pt.2, 2596.
[21]: 徳田浩一, 駒谷和範, 尾形哲也, 奥乃博音源定位結果と音声認識結果をHMDに統合呈示する聴覚障害者向け音環境理解支援システム, 情報処理学会第70回全国大会, 5ZD-7, Mar. 2008.
[22]: 奥乃博, 中臺一博ロボット聴覚の課題と現状, 情報処理, Vol.44, No.11 (2003) pp.1138–1144, 情報処理学会.
[23]: 奥乃博, 溝口博ロボット聴覚のための情報統合の現状と課題, 計測と制御, Vol.46, No.6 (2007) pp.415–419, 計測自動制御学会.
[24]: 奥乃博, 山本俊一音環境理解コンピューティング, 人工知能学会誌, Vol.22, No.6 (2007) pp.846–854, 人工知能学会.
[25]: Takeda, R., Nakadai, K., Komatani, K., Ogata, T., and Okuno, H.G. Exploiting Known Sound Sources to Improve ICA-based Robot Audition in Speech Separation and Recognition, In Proc. of IEEE/RSJ IROS-2007, pp.1757–1762, 2007.
[26]: Tasaki, T., Matsumoto, S., Ohba, H., Yamamoto, S., Toda, M., Komatani, K. and Ogata, T. and Okuno, H.G. Dynamic Communication of Humanoid Robot with Multiple People Based on Interaction Distance, 人工知能学会論文誌, Vol.20, No.3 (Mar. 2005) pp.209–219, 人工知能学会.
[27]: H-D. Kim, K. Komatani, T. Ogata, H.G. Okuno Binaural Active Audition for Humanoid Robots to Localize Speech over Entire Azimuth Range, Applied Bionics and Biomechanics, Special Issue on "Humanoid Robots", Vol.6, Issue 3 & 4(Sep. 2009) pp.355-368, Taylor & Francis 2009.