The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
O perceptron multicamadas (MLP) é um modelo básico de rede neural usado em aplicações industriais práticas, como sistemas de detecção de intrusão de rede (NID). Também é usado como bloco de construção em modelos mais recentes, como o gMLP. Atualmente, existe uma demanda por treinamentos rápidos em NID e outras áreas. No entanto, no treinamento com inúmeras GPUs, surgem problemas de consumo de energia e longos tempos de treinamento. Muitos dos mais recentes modelos de redes neurais profundas (DNN) e MLPs são treinados usando um algoritmo de retropropagação que transmite um gradiente de erro da camada de saída para a camada de entrada, de modo que, na computação sequencial, a próxima entrada não possa ser processada até que os pesos de todos as camadas são atualizadas a partir da última camada. Isso é conhecido como bloqueio reverso. Neste estudo, um mecanismo de atualização de parâmetros de peso é proposto com atrasos de tempo que podem acomodar o atraso de atualização de peso para permitir computação simultânea para frente e para trás. Para tanto, uma estrutura de array sistólico unidimensional foi projetada em uma placa Xilinx U50 Alveo FPGA na qual cada camada do MLP é atribuída a um elemento de processamento (PE). O algoritmo de retropropagação com atraso de tempo executa todas as camadas em paralelo e transfere dados entre camadas em um pipeline. Comparado com a CPU Intel Core i9 e a GPU NVIDIA RTX 3090, é 3 vezes mais rápido que a CPU e 2.5 vezes mais rápido que a GPU. A velocidade de processamento por consumo de energia é 11.5 vezes melhor que a da CPU e 21.4 vezes melhor que a da GPU. A partir destes resultados conclui-se que um acelerador de treinamento em um FPGA pode atingir alta velocidade e eficiência energética.
Takeshi SENOO
Tokyo Institute of Technology
Akira JINGUJI
Tokyo Institute of Technology
Ryosuke KURAMOCHI
Tokyo Institute of Technology
Hiroki NAKAHARA
Tokyo Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Takeshi SENOO, Akira JINGUJI, Ryosuke KURAMOCHI, Hiroki NAKAHARA, "Multilayer Perceptron Training Accelerator Using Systolic Array" in IEICE TRANSACTIONS on Information,
vol. E105-D, no. 12, pp. 2048-2056, December 2022, doi: 10.1587/transinf.2022PAP0003.
Abstract: Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022PAP0003/_p
Copiar
@ARTICLE{e105-d_12_2048,
author={Takeshi SENOO, Akira JINGUJI, Ryosuke KURAMOCHI, Hiroki NAKAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Multilayer Perceptron Training Accelerator Using Systolic Array},
year={2022},
volume={E105-D},
number={12},
pages={2048-2056},
abstract={Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.},
keywords={},
doi={10.1587/transinf.2022PAP0003},
ISSN={1745-1361},
month={December},}
Copiar
TY - JOUR
TI - Multilayer Perceptron Training Accelerator Using Systolic Array
T2 - IEICE TRANSACTIONS on Information
SP - 2048
EP - 2056
AU - Takeshi SENOO
AU - Akira JINGUJI
AU - Ryosuke KURAMOCHI
AU - Hiroki NAKAHARA
PY - 2022
DO - 10.1587/transinf.2022PAP0003
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2022
AB - Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.
ER -