NAISTAR
Advanced Search
Japanese | English

naistar (NAIST Academic Repository) >
学術リポジトリ naistar / NAIST Academic Repository naistar >
学術雑誌論文 / Journal Article >
情報科学研究科 / Graduate School of Information Science >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10061/7776

Title: A Microphone Array-based 3-D N-Best Search Method Recognizing of Multiple Sound Sources
Authors: Panikos Heracleous
Satoshi Nakamura
Takeshi Yamada
Kiyohiro Shikano
Keywords: 3-D N-best search
multiple sound sources
microphone array
Issue Date: Jun-2002
Publisher: 電子情報通信学会
Journal Title: IEICE Transactions on Information and Systems
Volume: E85-D
Issue: 6
Start page: 994
End page: 1002
Abstract: This paper describes a method for hands-free speech recognition, and particularly for the simultaneous recognition of multiple sound sources. The method is based on the 3-D Virerbi search, i.e., extended to the 3-D N-best search method enabling the recognition of multiple sound sources. The baseline system integrates two existing technologies - 3-D Viterbi search and conventional N-best search - into a complete system. Previously, the first evaluation of the 3-D N-best search-based system showed that new ideas are necessary to develop a system for the simultaneous recognition of multiple sound sources. It found two factors that play important roles in the performance of the system, namely the different likelihood ranges of the sound sources and the direction-based separation of the hypotheses. In order to solve these problems, we implemented a likelihood normalization and a path distance-based clustering technique into the baseline 3-D N-best search-based system. The performance of our system was evaluated through experiments on simulated data for the case of two talkers. The experiments showed significant improvements by implementing the above two techniques. The best results were obtained by implementing the two techniques and using a microphone array composed of 32 channels. More specifically, the Word Accuracy for the two talkers was higher than 80% and the Simultaneous Word Accuracy (where both sources are correctly recognized simultaneously) was higher than 70%, which are very promising results.
URI: http://hdl.handle.net/10061/7776
URL: https://search.ieice.org/
Fulltext: http://ci.nii.ac.jp/naid/110003210648
ISSN: 0916-8532
Rights: Copyright (C) 2002 電子情報通信学会.
Text Version: publisher
Appears in Collections:情報科学研究科 / Graduate School of Information Science

Files in This Item:

File SizeFormat
IEICETransInfoSys_E85D_6_994.pdf6.05 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Copyright (c) 2007-2012 Nara Institute of Science and Technology All Rights Reserved.
DSpace Software Copyright © 2002-2010  Duraspace - Feedback