Advanced Search
Japanese | English

naistar (NAIST Academic Repository) >
学術リポジトリ naistar / NAIST Academic Repository naistar >
国際会議発表論文 / Proceedings >
情報科学研究科 / Graduate School of Information Science >

Please use this identifier to cite or link to this item:

Title: Automatic N-gram Language Model Creation from Web Resources
Authors: Ryuichi Nishimura
Kumiko Komatsu
Yuka Kuroda
Kentaro Nagatomo
Akinobu Lee
Hiroshi Saruwatari
Kiyohiro Shikano
Issue Date: Sep-2001
Start page: 2127
End page: 2130
Abstract: This paper describes an automatic building of N-gram language models from Web texts for large vocabulary continuous speech recognition. Although a huge amount of well-formed texts are needed to train a model, collecting and organizing such text corpus for every task by hand needs a great labor. We need the language model to update frequently to cover the current topics. To deal with this problem, we propose an automatic language model creation method by collecting Web texts via keyword-based Web search engines. We can build a task-dependent language model by selecting suitable keywords for the task. A text filtering algorithm based on character perplexity is developed to extract proper Japanese texts from Web texts. A language model for a medical consulting task created by the proposed method shows the higher word recognition rate by 11.4% than that of a conventional newspaper language model.
Description: EUROSPEECH2001: the 7th European Conference on Speech Communication and Technology, September 3-7, 2001, Aalborg, Denmark.
ISSN: 1018-4074
Rights: Copyright 2001 ISCA
Text Version: Publisher
Appears in Collections:情報科学研究科 / Graduate School of Information Science

Files in This Item:

File SizeFormat
EUROSPEECH_2001_2127.pdf623.48 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


Copyright (c) 2007-2012 Nara Institute of Science and Technology All Rights Reserved.
DSpace Software Copyright © 2002-2010  Duraspace - Feedback