NAISTAR
Advanced Search
Japanese | English

naistar (NAIST Academic Repository) >
学術リポジトリ naistar / NAIST Academic Repository naistar >
国際会議発表論文 / Proceedings >
情報科学研究科 / Graduate School of Information Science >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10061/7894

Title: Training Data Size Requirements for Topic Classification in a Speech-Oriented Guidance System
Authors: Rafael Torres
Hiromichi Kawanami
Tomoko Matsui
Hiroshi Saruwatari
Kiyohiro Shikano
Issue Date: Dec-2010
Start page: 486
End page: 489
Abstract: In this work, we address the classification in topics of utterances in Japanese received by a speech-oriented guidance system operating in a real environment. The implementation of this kind of systems requires the collection and manual labeling of actual user's utterances, which is a costly process. Because of this, we are interested in evaluating the influence of the amount of data for training in the context of topic classification. For this, we compared the performance of a Support Vector Machine and a Maximum Entropy classifier using training data of different sizes. We used actual data collected by the speech-oriented guidance system Takemaru-kun, from adults and children, and also evaluated the effect of automatic speech recognition (ASR) errors in the classification performance. To deal with the shortness of the utterances we proposed to use characters as features, which is possible with the Japanese language due to the presence of kanji; ideograms from Chinese characters that represent not only sound but meaning. Experimental results show an average performance decrease of 4.6% for ASR results of utterances from adults, and 2.8% for children, when reducing the amount of data for training to its 25%; and a classification performance improvement from 92.2% to 94.1% for adults and 87.2% to 88.3% for children, when using character as features instead of words.
Description: APSIPA Annual Summit and Conference 2010, December 14-17, 2010, Biopolis, Singapore.
URI: http://hdl.handle.net/10061/7894
Rights: Copyright 2010 APSIPA
Text Version: Publisher
Appears in Collections:情報科学研究科 / Graduate School of Information Science

Files in This Item:

File SizeFormat
APSIPA_2010_486.pdf1.44 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Copyright (c) 2007-2012 Nara Institute of Science and Technology All Rights Reserved.
DSpace Software Copyright © 2002-2010  Duraspace - Feedback