The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Neste artigo apresentamos nossa investigação para melhorar o desempenho de nosso sistema de aprendizagem de línguas assistido por computador (CALL) através da exploração do modelo acústico e dos recursos dentro da estrutura de reconhecimento de fala. Primeiro, para aliviar a distorção do canal, a normalização média do cepstro dependente do alto-falante (CMN) é adotada e o coeficiente de correlação médio (CC médio) entre as pontuações da máquina e do especialista é melhorado de 78.00% para 84.14%. Em segundo lugar, a análise discriminante linear heterocedástica (HLDA) é adotada para aumentar a discriminabilidade do modelo acústico, o que aumenta com sucesso o CC médio de 84.14% para 84.62%. Além disso, o HLDA faz com que a precisão da pontuação seja mais estável em vários níveis de proficiência de pronúncia e, assim, leva a um aumento na taxa de classificação correta do falante de 85.59% para 90.99%. Finalmente, usamos a estimativa máxima a posteriori (MAP) para ajustar o modelo acústico para ajustar a fala de teste com forte sotaque. Como resultado, o CC médio melhorou de 84.62% para 86.57%. Essas três novas técnicas melhoram a precisão da avaliação da qualidade da pronúncia.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Fengpei GE, Changliang LIU, Jian SHAO, Fuping PAN, Bin DONG, Yonghong YAN, "Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech" in IEICE TRANSACTIONS on Information,
vol. E91-D, no. 10, pp. 2485-2492, October 2008, doi: 10.1093/ietisy/e91-d.10.2485.
Abstract: In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.10.2485/_p
Copiar
@ARTICLE{e91-d_10_2485,
author={Fengpei GE, Changliang LIU, Jian SHAO, Fuping PAN, Bin DONG, Yonghong YAN, },
journal={IEICE TRANSACTIONS on Information},
title={Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech},
year={2008},
volume={E91-D},
number={10},
pages={2485-2492},
abstract={In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.},
keywords={},
doi={10.1093/ietisy/e91-d.10.2485},
ISSN={1745-1361},
month={October},}
Copiar
TY - JOUR
TI - Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech
T2 - IEICE TRANSACTIONS on Information
SP - 2485
EP - 2492
AU - Fengpei GE
AU - Changliang LIU
AU - Jian SHAO
AU - Fuping PAN
AU - Bin DONG
AU - Yonghong YAN
PY - 2008
DO - 10.1093/ietisy/e91-d.10.2485
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2008
AB - In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.
ER -