NAISTAR
Advanced Search
Japanese | English

naistar (NAIST Academic Repository) >
学術リポジトリ naistar / NAIST Academic Repository naistar >
学術雑誌論文 / Journal Article >
情報科学研究科 / Graduate School of Information Science >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10061/7828

Title: Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System
Authors: Tobias Cincarek
Hiromichi Kawanami
Ryuichi Nishimura
Akinobu Lee
Hiroshi Saruwatari
Kiyohiro Shikano
Keywords: real environment
speech-oriented guidance system
development simulation
portability
speech recognizer question and answer database
Issue Date: Mar-2008
Publisher: 電子情報通信学会
Journal Title: IEICE Transactions on Information and Systems
Volume: E91-D
Issue: 3
Start page: 576
End page: 587
Abstract: In this paper, the development, long-term operation and portability of a practical ASR application in a real environment is investigated. The target application is a speech-oriented guidance system installed at the local community center. The system has been exposed to ordinary people since November 2002. More than 300 hours or more than 700,000 inputs have been collected during four years. The outcome is a rare example of a large scale real-environment speech database. A simulation experiment is carried out with this database to investigate how the systems performance improves during the first two years of operation. The purpose is to determine empirically the amount of real-environment data which has to be prepared to build a system with reasonable speech recognition performance and response accuracy. Furthermore, the relative importance of developing the main system components, i. e. speech recognizer and the response generation module, is assessed. Although depending on the systems modeling capacities and domain complexity, experimental results show that overall performance stagnates after employing about 10-15k utterances for training the acoustic model, 40-50k utterances for training the language model and 40k-50k utterances for compiling the question and answer database. The Q & A database was most important for improving the systems response accuracy. Finally, the portability of the well-trained first system prototype for a different environment, a local subway station, is investigated. Since collection and preparation of large amounts of real data is impractical in general, only one month of data from the new environment is employed for system adaptation. While the speech recognition component of the first prototype has a high degree of portability, the response accuracy is lower than in the first environment. The main reason is a domain difference between the two systems, since they are installed in different environments. This implicates that it is imperative to take the behavior of users under real conditions into account to build a system with high user satisfaction.
URI: http://hdl.handle.net/10061/7828
URL: https://search.ieice.org/
ISSN: 0916-8532
Rights: Copyright (C) 2008 電子情報通信学会.
Text Version: publisher
Publisher DOI: 10.1093/ietisy/e91-d.3.576
Appears in Collections:情報科学研究科 / Graduate School of Information Science

Files in This Item:

File SizeFormat
IEICETransInfoSys_E91D_3_576.pdf8.8 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Copyright (c) 2007-2012 Nara Institute of Science and Technology All Rights Reserved.
DSpace Software Copyright © 2002-2010  Duraspace - Feedback