The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Neste trabalho, propomos um novo sistema de reconhecimento automático de fala (ASR) baseado em aprendizado de características e um procedimento de treinamento ponta a ponta para sistemas de controle de tráfego aéreo (ATC). O modelo proposto integra o bloco de aprendizagem de recursos, rede neural recorrente (RNN) e perda de classificação temporal conexionista para construir um modelo ASR ponta a ponta. Enfrentando os ambientes complexos da fala ATC, em vez dos recursos artesanais, um bloco de aprendizagem é projetado para extrair recursos informativos de formas de onda brutas para modelagem acústica. Ambos os blocos de convolução SincNet e 1D são aplicados para processar as formas de onda brutas, cujas saídas são concatenadas às camadas RNN para a modelagem temporal. Graças à capacidade de aprender representações a partir de formas de onda brutas, o modelo proposto pode ser otimizado de maneira completa, ou seja, da forma de onda ao texto. Finalmente, a questão multilíngue no domínio ATC também é considerada para cumprir a tarefa ASR através da construção de um vocabulário combinado de caracteres chineses e letras inglesas. A abordagem proposta é validada em um corpus multilíngue do mundo real (ATCSpeech), e os resultados experimentais demonstram que a abordagem proposta supera outras linhas de base, alcançando uma taxa de erro de caracteres de 6.9%.
Peng FAN
Sichuan University
Xiyao HUA
Sichuan University
Yi LIN
Sichuan University
Bo YANG
Sichuan University
Jianwei ZHANG
Sichuan University
Wenyi GE
Chengdu University of Information Technology
Dongyue GUO
Sichuan University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Peng FAN, Xiyao HUA, Yi LIN, Bo YANG, Jianwei ZHANG, Wenyi GE, Dongyue GUO, "Speech Recognition for Air Traffic Control via Feature Learning and End-to-End Training" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 4, pp. 538-544, April 2023, doi: 10.1587/transinf.2022EDP7151.
Abstract: In this work, we propose a new automatic speech recognition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model integrates the feature learning block, recurrent neural network (RNN), and connectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the handcrafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D convolution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from waveform to text. Finally, the multilingual issue in the ATC domain is also considered to achieve the ASR task by constructing a combined vocabulary of Chinese characters and English letters. The proposed approach is validated on a multilingual real-world corpus (ATCSpeech), and the experimental results demonstrate that the proposed approach outperforms other baselines, achieving a 6.9% character error rate.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDP7151/_p
Copiar
@ARTICLE{e106-d_4_538,
author={Peng FAN, Xiyao HUA, Yi LIN, Bo YANG, Jianwei ZHANG, Wenyi GE, Dongyue GUO, },
journal={IEICE TRANSACTIONS on Information},
title={Speech Recognition for Air Traffic Control via Feature Learning and End-to-End Training},
year={2023},
volume={E106-D},
number={4},
pages={538-544},
abstract={In this work, we propose a new automatic speech recognition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model integrates the feature learning block, recurrent neural network (RNN), and connectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the handcrafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D convolution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from waveform to text. Finally, the multilingual issue in the ATC domain is also considered to achieve the ASR task by constructing a combined vocabulary of Chinese characters and English letters. The proposed approach is validated on a multilingual real-world corpus (ATCSpeech), and the experimental results demonstrate that the proposed approach outperforms other baselines, achieving a 6.9% character error rate.},
keywords={},
doi={10.1587/transinf.2022EDP7151},
ISSN={1745-1361},
month={April},}
Copiar
TY - JOUR
TI - Speech Recognition for Air Traffic Control via Feature Learning and End-to-End Training
T2 - IEICE TRANSACTIONS on Information
SP - 538
EP - 544
AU - Peng FAN
AU - Xiyao HUA
AU - Yi LIN
AU - Bo YANG
AU - Jianwei ZHANG
AU - Wenyi GE
AU - Dongyue GUO
PY - 2023
DO - 10.1587/transinf.2022EDP7151
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2023
AB - In this work, we propose a new automatic speech recognition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model integrates the feature learning block, recurrent neural network (RNN), and connectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the handcrafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D convolution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from waveform to text. Finally, the multilingual issue in the ATC domain is also considered to achieve the ASR task by constructing a combined vocabulary of Chinese characters and English letters. The proposed approach is validated on a multilingual real-world corpus (ATCSpeech), and the experimental results demonstrate that the proposed approach outperforms other baselines, achieving a 6.9% character error rate.
ER -