2.7 Conclusion

The authors have described our idea for studies on robot audition based on the creation of “the robot that hears a sound with its own ears” and expectation for future development. Since the robot audition study started from almost nothing, we have attempted to promote not only our own study but also the studies concerned We garnered the cooperation of the academia of Asano (AIST), Kobayashi (Waseda University) and Saruwatari et al.(NAIST), enterprises developing robot audition such as NEC, Hitachi, Toshiba and HRI-JP, and furthermore of overseas research institutes such as University of Sherbrooke in Canada, KIST in Korea, LAAS in France and HRI-EU in Germany and have organized the sessions for robot audition in IEEE/RSJ IROS for the past six years and the special sessions in academic lectures of the Robotics Society of Japan for the past five years. Furthermore, in 2009, we organized the robot audition special session in ICASSP-2009, the International Conference on Acoustics, Speech and Signal Processing organized by IEEE Signal Processing Society. Raising such a study community, researchers have been increasing slowly in the world and above all the high level of Japan’s robot audition studies is outstanding. We expect that Prince Shotoku robot will support hearing-impaired persons and elderly people and furthermore contribute to the construction of a peaceful society through more and more future development of our study.

Learning to listen to what others say at sixty years old, from “The Analects of Confucius / Government”
It is said that a person learns to listen to what others say at age sixty. However, the sensitivity of auditory organs for high frequencies drops through overwork or age, and the person loses the ability to hear conversations so he/she cannot depend on their ears even if they wish to.

[1]

Nakadai, Mitsunaga, Okuno (Eds): Special Issue on Robot Audition (Japanese), Journal of Robotics Society of Japan, Vol.28, No.1 (Jan. 2010).

[2]

C. Côté, et al.:Code Reusability Tools for Programming Mobile Robots, IEEE/RSJ IROS 2004, pp.1820–1825.

[3]

J.-M. Valin, F. Michaud, B. Hadjou, J. Rouat: Localization of simultaneous moving sound sources for mobile robot using a frequency-domain steered beamformer approach. IEEE ICRA 2004, pp.1033–1038.

[4]

S. Yamamoto, J.-M. Valin, K. Nakadai, T. Ogata, and H. G. Okuno. Enhanced robot speech recognition based on microphone array source separation and missing feature theory. IEEE ICRA 2005, pp.1427–1482.

[5]

H. G. Okuno and K. Nakadai: Robot Audition Open-Sourced Software HARK (Japanese) Journal of Robotics Society of Japan, Vol.28, No.1 (Jan. 2010) pp. 6–9, Robotics Society of Japan.

[6]

K. Nakadai, T. Takahasi, H.G. Okuno, H. Nakajima, Y. Hasegawa, H. Tsujino: Design and Implementation of Robot Audition System “HARK”, Advanced Robotics, Vol.24 (2010) 739-761, VSP and RSJ.

[7]

K. Nakamura, K. Nakadai, F. Asano, Y. Hasegawa, and H. Tsujino, “Intelligent Sound Source Localization for Dynamic Environments”, in Proc. of IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems (IROS 2009), pp. 664–669, 2009.

[8]

H. Nakajima, K. Nakadai, Y. Hasegawa, H. Tsujino: Blind Source Spearation With Parameter-Free Adaptive Step-Size Method for Robot Audition, IEEE Transactions on Audio, Speech, and Language Processing, Vol.18, No.6 (Aug. 2010) 1467–1485, IEEE.

[9]

D. Rosenthal, and H.G. Okuno (Eds.): Computational Auditory Scene Analysis, Lawrence Erlbaum Associates, 1998.

[10]

Bregman, A.S.: Auditory Scene Analysis – the Perceptual Organization of Sound, MIT Press (1990).

[11]

H.G. Okuno, T. Nakatani, T. Kawabata: Interfacing Sound Stream Segregation to Automatic Speech Recognition – Preliminary Results on Listening to Several Sounds Simultaneously, Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-1996), 1082–1089, AAAI, Portland, Aug. 1996.

[12]

Special interest group of AI Challenge, The Japanese Society of Artificial Intelligence. Papers are available on the web page: http://winnie.kuis.kyoto-u.ac.jp/AI-Challenge/

[13]

Y. Nishimura, T. Shinozaki, K. Iwano, S. Furui: Speech recognition using band-dependent weight likelihood (Japanese), Annual Meeting of the Acoustical Society of Japan, Vol.1, pp.117–118, 2004.

[14]

Nakadai, K., Lourens, T., Okuno, H.G., and Kitano, H.: Active Audition for Humanoid. In Proc. of AAAI-2000, pp.832–839, AAAI, Jul. 2000.

[15]

Nakadai, K., Hidai, T., Mizoguchi, H., Okuno, H.G., and Kitano, H.: Real-Time Auditory and Visual Multiple-Object Tracking for Robots, In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI-2001), pp.1425–1432, IJCAI, 2001.

[16]

Nakadai, K., Matasuura, D., Okuno, H.G., and Tsujino, H.: Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots, Speech Communication, Vol.44, No.1–4 (2004) pp.97–112, Elsevier.

[17]

Nakadai, K., Yamamoto, S., Okuno, H.G., Nakajima, H., Hasegawa, Y., Tsujino H.: A Robot Referee for Rock-Paper-Scissors Sound Games, Proceedings of IEEE-RAS International Conference on Robotics and Automation (ICRA-2008), pp.3469–3474, IEEE, May 20, 2008. doi:10.1109/ROBOT.2008.4543741

[18]

Kubota, Y., Yoshida, M., Komatani, K., Ogata, T., Okuno, H.G.: Design and Implementation of 3D Auditory Scene Visualizer towards Auditory Awareness with Face Tracking, Proceedings of IEEE International Symposium on Multimedia (ISM2008), pp.468–476, Berkeley, Dec. 16. 2008. doi:10.1109/ISM.2008.107

[19]

Kubota, Y., Shiramatsu, S., Yoshida, M., Komatani, K., Ogata, T., Okuno, H.G.: 3D Auditory Scene Visualizer With Face Tracking: Design and Implementation For Auditory Awareness Compensation, Proceedings of 2nd International Symposium on Universal Communication (ISUC2008), pp.42–49, IEEE, Osaka, Dec. 15. 2008. doi:10.1109/ISUC.2008.59

[20]

Kashino, M., and Hirahara, T.: One, two, many – Judging the number of concurrent talkers, Journal of Acoustic Society of America, Vol.99, No.4 (1996), Pt.2, 2596.

[21]

K. Tokuda, K. Komatani, T. Ogata, H. G. Okuno: Hearing-Impared Supporting System in Understanding Auditory Scenes by Presenting Sound Source Localization and Speech Recognition Results in Integrated manner on HMD (Japanese), The 70th Annual Conference of Information Processing Society ofJapan, 5ZD-7, Mar. 2008.

[22]

H. G. Okuno, K. Nakadai: Research Issues and Current Status of Robto Audition (Japanese), IPSJ Magazine “Joho Shori” Vol.44, No.11 (2003) pp.1138–1144, Information Processing Society of Japan.

[23]

H. Okuno, H. Mizoguchi: Information Integration for Robot Audition: State-of-the-art and Issues (Japanese) Journal of the Sciety of Instrument and Control Engineers, Vol.46, No.6 (2007) pp.415–419.

[24]

H. G. Okuno, S. Yamamoto: Computing for Computational Auditory Scene Analysis (Japanese) Journal of the Japanese Society of Artificial Intelligence, Vol.22, No.6 (2007) pp.846–854.

[25]

Takeda, R., Nakadai, K., Komatani, K., Ogata, T., and Okuno, H.G.: Exploiting Known Sound Sources to Improve ICA-based Robot Audition in Speech Separation and Recognition, In Proc. of IEEE/RSJ IROS-2007, pp.1757–1762, 2007.

[26]

Tasaki, T., Matsumoto, S., Ohba, H., Yamamoto, S., Toda, M., Komatani, K. and Ogata, T. and Okuno, H.G.: Dynamic Communication of Humanoid Robot with Multiple People Based on Interaction Distance, Journal of The Japanese Society of Artificial Intelligence, Vol.20, No.3 (Mar. 2005) pp.209–219.

[27]

H-D. Kim, K. Komatani, T. Ogata, H.G. Okuno: Binaural Active Audition for Humanoid Robots to Localize Speech over Entire Azimuth Range, Applied Bionics and Biomechanics, Special Issue on "Humanoid Robots", Vol.6, Issue 3 & 4(Sep. 2009) pp.355-368, Taylor & Francis 2009.