|
naistar (NAIST Academic Repository) >
学術リポジトリ naistar / NAIST Academic Repository naistar >
国際会議発表論文 / Proceedings >
情報科学研究科 / Graduate School of Information Science >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10061/8099
|
| Title: | Speech-to-Lip Movement Synthesis Based on the EM Algorithm Using Audio-Visual HMMs |
| Authors: | Eli Yamamoto Satoshi Nakamura Kiyohiro Shikano |
| Issue Date: | Nov-1998 |
| Start page: | 1275 |
| End page: | 1278 |
| Article Number: | 0756 |
| Abstract: | This paper proposes a method to re-estimate output visual parameters for speech-to-lip movement synthesis using audio-visual Hidden Markov Models (HMMs) under the Expectation-Maximization(EM) algorithm. In the conventional methods for speech-to-lip movement synthesis, there is a synthesis method estimating a visual parameter sequence through the Viterbi alignment of an input acoustic speech signal using audio HMMs. However, the HMM-Viterbi method involves a substantial problem that incorrect HMM state alignment may output incorrect visual parameters. The problem in the HMM-Viterbi method is caused by the deterministic synthesis process to assign a single HMM state for an input audio frame. The proposed method avoids the deterministic process by re-estimating non-deterministic visual parameters while maximizing the likelihood of the audio-visual observation sequence under the EM algorithm. |
| Description: | ICSLP1998: the 5th International Conference on Spoken Language Processing, November 30 - December 4, 1998, Sydney, Australia. |
| URI: | http://hdl.handle.net/10061/8099 |
| Text Version: | Publisher |
| Appears in Collections: | 情報科学研究科 / Graduate School of Information Science
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|