The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Na prática, mesmo um modelo de tradução automática neural (NMT) bem treinado ainda pode fazer inferências tendenciosas no conjunto de treinamento devido a mudanças na distribuição. Para o processo de aprendizagem humano, se não conseguirmos reproduzir algo corretamente depois de aprender várias vezes, consideramos que é mais difícil. Da mesma forma, um exemplo de treinamento que provoque uma grande discrepância entre inferência e referência implica maior dificuldade de aprendizagem para o modelo MT. Portanto, propomos adotar a discrepância de inferência de cada exemplo de treinamento como critério de dificuldade, e de acordo com o qual classificar os exemplos de treinamento de fácil a difícil. Dessa forma, um modelo treinado pode orientar o processo de aprendizagem curricular de um modelo inicial idêntico a ele. Apresentamos uma analogia a este esquema de formação como guia do processo de aprendizagem de um modelo curricular NMT por um modelo vanilla pré-treinado. Neste artigo, avaliamos a eficácia do esquema de treinamento proposto e obtemos uma visão sobre a influência da direção da tradução, das métricas de avaliação e dos diferentes cronogramas curriculares. Resultados experimentais em benchmarks de tradução WMT14 Inglês ⇒ Alemão, WMT17 Chinês ⇒ Inglês e Multitarget TED Talks Task (MTTT) Inglês ⇔ Alemão, Inglês ⇔ Chinês, Inglês ⇔ Russo demonstram que nosso método proposto melhora consistentemente o desempenho da tradução em relação à linha de base avançada do Transformer.
Lei ZHOU
Nagoya University
Ryohei SASANO
Nagoya University
Koichi TAKEDA
Nagoya University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Lei ZHOU, Ryohei SASANO, Koichi TAKEDA, "Inference Discrepancy Based Curriculum Learning for Neural Machine Translation" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 1, pp. 135-143, January 2024, doi: 10.1587/transinf.2023EDP7048.
Abstract: In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7048/_p
Copiar
@ARTICLE{e107-d_1_135,
author={Lei ZHOU, Ryohei SASANO, Koichi TAKEDA, },
journal={IEICE TRANSACTIONS on Information},
title={Inference Discrepancy Based Curriculum Learning for Neural Machine Translation},
year={2024},
volume={E107-D},
number={1},
pages={135-143},
abstract={In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.},
keywords={},
doi={10.1587/transinf.2023EDP7048},
ISSN={1745-1361},
month={January},}
Copiar
TY - JOUR
TI - Inference Discrepancy Based Curriculum Learning for Neural Machine Translation
T2 - IEICE TRANSACTIONS on Information
SP - 135
EP - 143
AU - Lei ZHOU
AU - Ryohei SASANO
AU - Koichi TAKEDA
PY - 2024
DO - 10.1587/transinf.2023EDP7048
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2024
AB - In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.
ER -