図書館

Spoken language resources for Cantonese speech processing

Lee T., Lo W.K., Ching P.C., Meng H.

論文

アブストラクト

This paper describes the development of CU Corpora, a series of large-scale
speech corpora for Cantonese. Cantonese is the most commonly spoken Chinese
dialect in Southern China and Hong Kong. CU Corpora are the first of their
kind and intended to serve as an important infrastructure for the
advancement of speech recognition and synthesis technologies for this
widely used Chinese dialect. They contain a large amount of speech data
that cover various linguistic units of spoken Cantonese, including isolated
syllables, polysyllabic words and continuous sentences. While some of the
corpora are created for specific applications of common interest, the
others are designed with emphasis on the coverage and distributions of
different phonetic units, including the contextual ones. The speech data
are annotated manually so as to provide sufficient orthographic and
phonetic information for the development of different applications.
Statistical analysis of the annotated data shows that CU Corpora contain
rich and balanced phonetic content. The usefulness of the corpora is also
demonstrated with a number of speech recognition and speech synthesis
applications.

詳細情報

キーワード: Speech databases development , Chinese dialects , Chinese phonology and phonetics , Annotation of speech data , Applications of speech technology , Speech recognition , Text-to-speech synthesis

掲載資料: Speech communication. Vol.36 No.3, 2002年3月, p.327-342

Spoken language resources for Cantonese speech processing

メールで送信

宛先

件名

アブストラクト

詳細情報

ブックマークを編集

リストを選択

メモ

リストを選択

ブックマークに登録

リストを選択

メモ