Advanced Search
Japanese | English

naistar (NAIST Academic Repository) >
学術リポジトリ naistar / NAIST Academic Repository naistar >
国際会議発表論文 / Proceedings >
情報科学研究科 / Graduate School of Information Science >

Please use this identifier to cite or link to this item:

Title: Multiple Sound Sources Recognition by a Microphone Array-Based 3-D N-Best Search with Likelihood
Authors: Panikos Heracleous
Satoshi Nakamura
Kiyohiro Shikano
Issue Date: Apr-2001
Start page: 103
End page: 106
Abstract: This paper deals with the hands-free speech recognition and, particularly, with the simultaneous recognition of multiple sound sources. Our method is based on the 3-D Viterbi search, i.e., extended to 3-D N-best search method enabling the recognition of multiple sound sources. The baseline system integrates two existing technologies - 3-D Viterbi search and conventional N-best search - into a complete system. However, the first evaluation of the 3-D N-best search-based system showed, that new ideas are necessary in order to build a system for simultaneous recognition of multiple sound sources. Two factors found to have an important role in the performance of our system, namely the different likelihood ranges of the sound sources and the direction-based separation of the hypotheses. In order to solve these problems we implemented a likelihood normalization and a path distance-based clustering technique into the baseline 3-D N-best search-based system. The performance of our system was evaluated through experiments on simulated data for the case of two talkers. The experiments showed significant improvements by implementing the two techniques described above. The best results were obtained by implementing the two techniques and using a microphone array composed of 32 elements. More specifically, in that case the Word Accuracy for the two talkers was higher that 80% and the Simultaneous Word Accuracy (both sources are correctly recognized simultaneously ) higher than 70 %, which are very promising results.
Description: HSC2001: IEEE International Workshop on Hands-Free Speech Communication, April 9-11, 2001, Kyoto, Japan.
Rights: Copyright 2001 IEEE
Text Version: Publisher
Appears in Collections:情報科学研究科 / Graduate School of Information Science

Files in This Item:

File SizeFormat
HSC_2001_103.pdf457.5 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


Copyright (c) 2007-2012 Nara Institute of Science and Technology All Rights Reserved.
DSpace Software Copyright © 2002-2010  Duraspace - Feedback