The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Exibições de texto completo
91
Muitos estudos de redes neurais profundas relataram aceleradores de inferência para melhorar a eficiência energética. Propomos métodos para melhorar ainda mais a eficiência energética, mantendo a precisão do reconhecimento, que foram desenvolvidos pelo co-projeto de um esquema de quantização filtro por filtro com precisão de bits variável e uma arquitetura de hardware que o suporta totalmente. A quantização por filtro reduz a precisão média dos pesos, de modo que os tempos de execução e o consumo de energia para inferência são reduzidos proporcionalmente ao número total de cálculos multiplicados pela precisão média dos pesos. A utilização do hardware também é melhorada por uma arquitetura paralela de bits adequada para precisão de pesos de bits quantizada granularmente. Implementamos a arquitetura proposta em um FPGA e demonstramos que os ciclos de execução são reduzidos para 1/5.3 para ResNet-50 no ImageNet em comparação com um método convencional, mantendo a precisão do reconhecimento.
Asuka MAKI
Kioxia Corporation
Daisuke MIYASHITA
Kioxia Corporation
Shinichi SASAKI
Kioxia Corporation
Kengo NAKATA
Kioxia Corporation
Fumihiko TACHIBANA
Kioxia Corporation
Tomoya SUZUKI
Kioxia Corporation
Jun DEGUCHI
Kioxia Corporation
Ryuichi FUJIMOTO
Kioxia Corporation
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Asuka MAKI, Daisuke MIYASHITA, Shinichi SASAKI, Kengo NAKATA, Fumihiko TACHIBANA, Tomoya SUZUKI, Jun DEGUCHI, Ryuichi FUJIMOTO, "Weight Compression MAC Accelerator for Effective Inference of Deep Learning" in IEICE TRANSACTIONS on Electronics,
vol. E103-C, no. 10, pp. 514-523, October 2020, doi: 10.1587/transele.2019CTP0007.
Abstract: Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/transele.2019CTP0007/_p
Copiar
@ARTICLE{e103-c_10_514,
author={Asuka MAKI, Daisuke MIYASHITA, Shinichi SASAKI, Kengo NAKATA, Fumihiko TACHIBANA, Tomoya SUZUKI, Jun DEGUCHI, Ryuichi FUJIMOTO, },
journal={IEICE TRANSACTIONS on Electronics},
title={Weight Compression MAC Accelerator for Effective Inference of Deep Learning},
year={2020},
volume={E103-C},
number={10},
pages={514-523},
abstract={Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.},
keywords={},
doi={10.1587/transele.2019CTP0007},
ISSN={1745-1353},
month={October},}
Copiar
TY - JOUR
TI - Weight Compression MAC Accelerator for Effective Inference of Deep Learning
T2 - IEICE TRANSACTIONS on Electronics
SP - 514
EP - 523
AU - Asuka MAKI
AU - Daisuke MIYASHITA
AU - Shinichi SASAKI
AU - Kengo NAKATA
AU - Fumihiko TACHIBANA
AU - Tomoya SUZUKI
AU - Jun DEGUCHI
AU - Ryuichi FUJIMOTO
PY - 2020
DO - 10.1587/transele.2019CTP0007
JO - IEICE TRANSACTIONS on Electronics
SN - 1745-1353
VL - E103-C
IS - 10
JA - IEICE TRANSACTIONS on Electronics
Y1 - October 2020
AB - Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
ER -