Show simple item record

dc.contributor.author Andros, Tjandra
dc.contributor.author Sakriani, Sakti
dc.contributor.author Satoshi, Nakamura
dc.date.accessioned 2020-04-30T07:48:26Z
dc.date.available 2020-04-30T07:48:26Z
dc.date.issued 2020-03-02
dc.identifier.issn 2329-9304
dc.identifier.uri http://hdl.handle.net/10061/13982
dc.description.abstract Despite the close relationship between speech perception and production, research in automatic speech recognition (ASR) and text-to-speech synthesis (TTS) has progressed more or less independently without exerting much mutual influence. In human communication, on the other hand, a closed-loop speech chain mechanism with auditory feedback from the speaker's mouth to her ear is crucial. In this paper, we take a step further and develop a closed-loop machine speech chain model based on deep learning. The sequence-to-sequence model in closed-loop architecture allows us to train our model on the concatenation of both labeled and unlabeled data. While ASR transcribes the unlabeled speech features, TTS attempts to reconstruct the original speech waveform based on the text from ASR. In the opposite direction, ASR also attempts to reconstruct the original text transcription given the synthesized speech. To the best of our knowledge, this is the first deep learning framework that integrates human speech perception and production behaviors. Our experimental results show that the proposed approach significantly improved performance over that from separate systems that were only trained with labeled data. ja_JP
dc.language.iso en ja_JP
dc.publisher IEEE ja_JP
dc.relation.isreplacedby https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9020132 ja_JP
dc.rights Authors ja_JP
dc.subject Speech processing ja_JP
dc.subject Data models ja_JP
dc.subject Machine learning ja_JP
dc.subject Task analysis ja_JP
dc.subject Training ja_JP
dc.subject Hidden Markov models ja_JP
dc.title Machine Speech Chain ja_JP
dc.type.nii Journal Article ja_JP
dc.contributor.transcription ナカムラ, サトシ
dc.contributor.alternative 中村, 哲
dc.textversion none ja_JP
dc.identifier.jtitle IEEE Transactions of Audio Speech and Language Processing ja_JP
dc.identifier.volume 28 ja_JP
dc.identifier.spage 976 ja_JP
dc.identifier.epage 989 ja_JP
dc.relation.doi 10.1109/TASLP.2020.2977776 ja_JP
dc.identifier.NAIST-ID 85624831 ja_JP
dc.identifier.NAIST-ID 73297715 ja_JP
dc.identifier.NAIST-ID 73296626 ja_JP


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account