DSpace Repository

Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load

Show simple item record

dc.contributor.author Wu, Bin
dc.contributor.author Sakti, Sakriani
dc.contributor.author Zhang, Jinsong
dc.contributor.author Nakamura, Satoshi
dc.date.accessioned 2021-01-07T06:30:57Z
dc.date.available 2021-01-07T06:30:57Z
dc.date.issued 2020-12-02
dc.identifier.uri http://hdl.handle.net/10061/14206
dc.description.abstract The human perception of phonemes is biased against speech sounds. The lack of correspondence between perceputal phonemes and acoustic signals forms a big challenge in designing unsupervised algorithms to distinguish phonemes from sound. We propose the DPGMM-RNN hybrid model that improves phoneme categorization by relieving the fragmentation problem. We also merge segments with low functional load, which is the work done by segment contrasts to differentiate between utterances, just like humans who convert unambiguous segments into phonemes as units for immediate perception. Our results show that the DPGMM-RNN hybrid model relieves the fragmentation problem and improves phoneme discriminability. The minimal functional load merge compresses a segment system, preserves information and keeps phoneme discriminability. ja_JP
dc.language.iso en ja_JP
dc.publisher IEEE ja_JP
dc.relation.isreplacedby https://ieeexplore.ieee.org/document/9276474 ja_JP
dc.rights This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ ja_JP
dc.subject Acoustics ja_JP
dc.subject Clustering algorithms ja_JP
dc.subject Auditory system ja_JP
dc.subject Ear ja_JP
dc.subject Visualization ja_JP
dc.subject Load modeling ja_JP
dc.subject Context modeling ja_JP
dc.title Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load ja_JP
dc.type.nii Journal Article ja_JP
dc.contributor.transcription ナカムラ, サトシ
dc.contributor.alternative 中村, 哲
dc.textversion none ja_JP
dc.identifier.eissn 2329-9304
dc.identifier.jtitle IEEE/ACM Transactions on Audio, Speech, and Language Processing ja_JP
dc.identifier.volume 29 ja_JP
dc.identifier.spage 348 ja_JP
dc.identifier.epage 362 ja_JP
dc.relation.doi 10.1109/TASLP.2020.3042016 ja_JP
dc.identifier.NAIST-ID 85629731 ja_JP
dc.identifier.NAIST-ID 73297715 ja_JP
dc.identifier.NAIST-ID 73296626 ja_JP

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account