DSpace Repository

A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments

Show simple item record

dc.contributor.author Novitasari, Sashi en
dc.contributor.author Sakti, Sakriani en
dc.contributor.author Nakamura, Satoshi en
dc.date.accessioned 2022-08-24T06:47:02Z en
dc.date.available 2022-08-24T06:47:02Z en
dc.date.issued 2022-08-05 en
dc.identifier.uri http://hdl.handle.net/10061/14770 en
dc.description.abstract Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized high-quality speech. However, TTS speech intelligibility degrades in noisy environments because most of these systems were not designed to handle noisy environments. Several works attempted to address this problem by using offline fine-tuning to adapt their TTS to noisy conditions. Unlike machines, humans never perform offline fine-tuning. Instead, they speak with the Lombard effect in noisy places, where they dynamically adjust their vocal effort to improve the audibility of their speech. This ability is supported by the speech chain mechanism, which involves auditory feedback passing from speech perception to speech production. This paper proposes an alternative approach to TTS in noisy environments that is closer to the human Lombard effect. Specifically, we implement Lombard TTS in a machine speech chain framework to synthesize speech with dynamic adaptation. Our TTS performs adaptation by generating speech utterances based on the auditory feedback that consists of the automatic speech recognition (ASR) loss as the speech intelligibility measure and the speech-to-noise ratio (SNR) prediction as power measurement. Two versions of TTS are investigated: non-incremental TTS with utterance-level feedback and incremental TTS (ITTS) with short-term feedback to reduce the delay without significant performance loss. Furthermore, we evaluate the TTS systems in both static and dynamic noise conditions. Our experimental results show that auditory feedback enhanced the TTS speech intelligibility in noise. en
dc.language.iso en en
dc.publisher IEEE en
dc.relation.isreplacedby https://ieeexplore.ieee.org/document/9851526 en
dc.rights This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ en
dc.title A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments en
dc.type.nii Journal Article en
dc.contributor.transcription ナカムラ, サトシ ja
dc.contributor.alternative 中村, 哲 ja
dc.textversion none en
dc.identifier.eissn 2329-9304 en
dc.identifier.jtitle IEEE/ACM Transactions on Audio, Speech, and Language Processing en
dc.identifier.volume 30 en
dc.identifier.spage 2673 en
dc.identifier.epage 2688 en
dc.relation.doi 10.1109/TASLP.2022.3196879 en
dc.identifier.NAIST-ID 86635778 en
dc.identifier.NAIST-ID 73297715 en
dc.identifier.NAIST-ID 73296626 en


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account