|
naistar (NAIST Academic Repository) >
学術リポジトリ naistar / NAIST Academic Repository naistar >
国際会議発表論文 / Proceedings >
情報科学研究科 / Graduate School of Information Science >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10061/7919
|
| Title: | Speech to Lip Synthesis by HMM |
| Authors: | Eli Yamamoto Satoshi Nakamura Kiyohiro Shikano |
| Issue Date: | Sep-1997 |
| Start page: | 137 |
| End page: | 140 |
| Abstract: | Synthesized lip movement images can compensate lack of auditory information for hearing impaired people, and also contribute to realize a human-like face of computer agents. We propose a novel method to synthesize lip movement based on mapping from an input speech using HMM. This paper compares the HMM method and a conventional method using VQ or ANN to convert speech-to-lip movement images. In the experiment, error and time difference error between synthesized lip movement images and original ones are utilized for evaluation. The result shows that the error of the HMM method is 8.6% smaller than that of the VQ method. Moreover, the HMM method reduces time difference error by 34.8% than the VQ's. The result also shows that the errors are mostly caused by phoneme /h/ and /Q/. Since those phonemes are dependent on succeeding phoneme, the context-dependent synthesis on the HMM method is applied to reduce the error. The context-dependent HMM method realizes that the error(difference error) is reduced by 11.3%(8.9%) compared with the original HMM method. |
| Description: | AVSP 1997: Audio Visual Signal Processing Workshop, September 6-7, 1997, Rhodes, Greece. |
| URI: | http://hdl.handle.net/10061/7919 |
| ISSN: | 1018-4554 |
| Rights: | Copyright 1997 ISCA |
| Text Version: | Publisher |
| Appears in Collections: | 情報科学研究科 / Graduate School of Information Science
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|