Recent Advances in End-to-End Speech Recognition

Recent Advances in End-to-End Speech Recognition

Shigeki Karita

生駒 : 奈良先端科学技術大学院大学, 2019.11

授業アーカイブ

巻号情報

全1件
No. 刷年 所在 請求記号 資料ID 貸出区分 状況 予約人数

1

  • LA-I-R[MPDASH][Mobile]

M016889

内容紹介

This talk explains recent advances in end-to-end automatic speech recognition (ASR) at NTT. First, I will give an overview of NTT speech technologies and open-source toolkit ESPnet. Then, I will introduce our proposed semi-supervised end-to-end ASR method (ICASSP19 https://ieeexplore.ieee.org/abstract/document/8682890). In this paper, we introduce speech and text autoencoders that share encoders and decoders with an automatic speech recognition (ASR) model to improve ASR performance with large speech only and text only training datasets. To build the speech and text autoencoders, we leverage state-of-the-art ASR and text-to-speech (TTS) encoder-decoder architectures. These autoencoders learn features from speech only and text only datasets by switching the encoders and decoders used in the ASR and TTS models. Simultaneously, they aim to encode features to be compatible with ASR and TTS models by a multi-task loss. Additionally, we anticipate that TTS joint training can also improve ASR performance because both ASR and TTS models learn transformations between speech and text. The experimental result we obtained with our semi-supervised end-to-end ASR/TTS training revealed reductions from a model initially trained with a small paired subset of the LibriSpeech corpus in the character error rate from 10.4% to 8.4% and word error rate from 20.6% to 18.0% by retraining the model with a large unpaired subset of the corpus.

詳細情報

刊年

2019

形態

電子化映像資料(1時間26分46秒)

シリーズ名

情報科学領域・コロキアム ; 2019年度

注記

講演者所属: NTT Communication Science Laboratories

講演日: 2019年11月13日

講演場所: 情報科学棟大講義室L1

標題言語

英語 (eng)

本文言語

英語 (eng)

著者情報

Karita, Shigeki