The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Com o refinamento contínuo das Redes Neurais Profundas (DNNs), uma série de redes profundas e complexas, como as Redes Residuais (ResNets), mostram uma precisão de previsão impressionante em tarefas de classificação de imagens. Infelizmente, a complexidade estrutural e o custo computacional das redes residuais dificultam a implementação do hardware. Neste artigo, apresentamos a técnica de rede neural profunda quantizada e reconstruída (QR-DNN), que primeiro insere camadas de normalização de lote (BN) na rede durante o treinamento e depois as remove para facilitar a implementação eficiente de hardware. Além disso, um acelerador de rede residual (RNA) preciso e eficiente é apresentado baseado em QR-DNN com estruturas e pesos livres de normalização de lote representados em um sistema numérico logarítmico. O RNA emprega uma arquitetura de matriz sistólica para realizar operações de deslocamento e acumulação em vez de operações de multiplicação. Foi demonstrado que o QR-DNN alcança uma melhoria de 1 a 2% na precisão em relação às técnicas existentes e o RNA em relação aos melhores aceleradores de ponto fixo anteriores. Uma implementação FPGA em um dispositivo Xilinx Zynq XC7Z045 atinge 804.03 GOPS, 104.15 FPS e 91.41% de precisão top-5 para o benchmark ResNet-50, e resultados de última geração também são relatados para AlexNet e VGG.
Cheng LUO
Fudan University
Wei CAO
Fudan University
Lingli WANG
Fudan University
Philip H. W. LEONG
University of Sydney
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Cheng LUO, Wei CAO, Lingli WANG, Philip H. W. LEONG, "RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 5, pp. 1037-1045, May 2019, doi: 10.1587/transinf.2018RCP0008.
Abstract: With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks. Unfortunately, the structural complexity and computational cost of residual networks make hardware implementation difficult. In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them to facilitate efficient hardware implementation. Moreover, an accurate and efficient residual network accelerator (RNA) is presented based on QR-DNN with batch-normalization-free structures and weights represented in a logarithmic number system. RNA employs a systolic array architecture to perform shift-and-accumulate operations instead of multiplication operations. QR-DNN is shown to achieve a 1∼2% improvement in accuracy over existing techniques, and RNA over previous best fixed-point accelerators. An FPGA implementation on a Xilinx Zynq XC7Z045 device achieves 804.03 GOPS, 104.15 FPS and 91.41% top-5 accuracy for the ResNet-50 benchmark, and state-of-the-art results are also reported for AlexNet and VGG.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018RCP0008/_p
Copiar
@ARTICLE{e102-d_5_1037,
author={Cheng LUO, Wei CAO, Lingli WANG, Philip H. W. LEONG, },
journal={IEICE TRANSACTIONS on Information},
title={RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks},
year={2019},
volume={E102-D},
number={5},
pages={1037-1045},
abstract={With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks. Unfortunately, the structural complexity and computational cost of residual networks make hardware implementation difficult. In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them to facilitate efficient hardware implementation. Moreover, an accurate and efficient residual network accelerator (RNA) is presented based on QR-DNN with batch-normalization-free structures and weights represented in a logarithmic number system. RNA employs a systolic array architecture to perform shift-and-accumulate operations instead of multiplication operations. QR-DNN is shown to achieve a 1∼2% improvement in accuracy over existing techniques, and RNA over previous best fixed-point accelerators. An FPGA implementation on a Xilinx Zynq XC7Z045 device achieves 804.03 GOPS, 104.15 FPS and 91.41% top-5 accuracy for the ResNet-50 benchmark, and state-of-the-art results are also reported for AlexNet and VGG.},
keywords={},
doi={10.1587/transinf.2018RCP0008},
ISSN={1745-1361},
month={May},}
Copiar
TY - JOUR
TI - RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks
T2 - IEICE TRANSACTIONS on Information
SP - 1037
EP - 1045
AU - Cheng LUO
AU - Wei CAO
AU - Lingli WANG
AU - Philip H. W. LEONG
PY - 2019
DO - 10.1587/transinf.2018RCP0008
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2019
AB - With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks. Unfortunately, the structural complexity and computational cost of residual networks make hardware implementation difficult. In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them to facilitate efficient hardware implementation. Moreover, an accurate and efficient residual network accelerator (RNA) is presented based on QR-DNN with batch-normalization-free structures and weights represented in a logarithmic number system. RNA employs a systolic array architecture to perform shift-and-accumulate operations instead of multiplication operations. QR-DNN is shown to achieve a 1∼2% improvement in accuracy over existing techniques, and RNA over previous best fixed-point accelerators. An FPGA implementation on a Xilinx Zynq XC7Z045 device achieves 804.03 GOPS, 104.15 FPS and 91.41% top-5 accuracy for the ResNet-50 benchmark, and state-of-the-art results are also reported for AlexNet and VGG.
ER -