The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
No aprendizado de máquina, o aumento de dados (DA) é uma técnica para melhorar o desempenho de generalização de modelos. Neste artigo, consideramos principalmente a descida gradiente da regressão linear sob DA usando cópias ruidosas de conjuntos de dados, nos quais o ruído é injetado nas entradas. Analisamos a situação onde cópias ruidosas são geradas e injetadas em entradas em cada época, ou seja, o caso do uso de cópias ruidosas on-line. Portanto, este artigo também pode ser visto como uma análise de um método que utiliza injeção de ruído em um processo de treinamento por DA. Consideramos o processo de treinamento em três situações de treinamento, que são o treinamento de lote completo sob a soma dos erros quadráticos e o treinamento de lote completo e minilote sob o erro quadrático médio. Mostramos que, em todos os casos, o treinamento para AD com cópias on-line é aproximadamente equivalente ao l2 treinamento de regularização para o qual a variação do ruído injetado é importante, enquanto o número de cópias não é. Além disso, mostramos que DA com cópias on-line aparentemente leva a um aumento na taxa de aprendizado na condição de lote completo sob a soma dos erros quadráticos e na condição de minilote sob o erro quadrático médio. O aparente aumento na taxa de aprendizagem e no efeito de regularização pode ser atribuído à entrada original e ao ruído aditivo em cópias com ruído, respectivamente. Esses resultados são confirmados em um experimento numérico no qual descobrimos que nosso resultado pode ser aplicado ao DA off-line usual em um cenário de subparametrização e não em um cenário de sobreparametrização. Além disso, investigamos experimentalmente o processo de treinamento de redes neurais sob DA com cópias off-line com ruído e descobrimos que nossa análise de regressão linear pode ser aplicada qualitativamente a redes neurais.
Katsuyuki HAGIWARA
Mie University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Katsuyuki HAGIWARA, "On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 9, pp. 1537-1545, September 2023, doi: 10.1587/transinf.2023EDP7008.
Abstract: In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7008/_p
Copiar
@ARTICLE{e106-d_9_1537,
author={Katsuyuki HAGIWARA, },
journal={IEICE TRANSACTIONS on Information},
title={On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies},
year={2023},
volume={E106-D},
number={9},
pages={1537-1545},
abstract={In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.},
keywords={},
doi={10.1587/transinf.2023EDP7008},
ISSN={1745-1361},
month={September},}
Copiar
TY - JOUR
TI - On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies
T2 - IEICE TRANSACTIONS on Information
SP - 1537
EP - 1545
AU - Katsuyuki HAGIWARA
PY - 2023
DO - 10.1587/transinf.2023EDP7008
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2023
AB - In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.
ER -