The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Uma nova técnica de compressão de dados de configuração para arquiteturas reconfiguráveis de granulação grossa (CGRAs) é proposta. A redução do tamanho dos dados de configuração dos CGRAs encurta o tempo de reconfiguração, especialmente quando a largura de banda de comunicação entre um CGRA e uma CPU host é limitada. Além disso, economiza o consumo de energia do cache de configuração e do controlador. A técnica proposta é baseada em uma técnica de configuração multicast chamada RoMultiC, que reduz o tempo de configuração ao multicasting os mesmos dados para múltiplos PEs (Elementos de Processamento) com dois mapas de bits. Algoritmos de escalonamento para otimizar a ordem de multicasting foram propostos. Porém, o multicasting só é possível se cada PE tiver completamente a mesma configuração. Em geral, os dados de configuração para CGRAs podem ser divididos em alguns campos, como formatos de código de máquina de CPUs de uso geral. O esquema proposto limita uma parte dos campos para multicasting para que a possibilidade de multicasting de mais PEs possa ser aumentada. Este artigo analisa algoritmos para encontrar um padrão de configuração que maximize o número de PEs multicast. Implementamos o esquema proposto para CMA (Cool Mega Array), um CGRA simples como estudo de caso. Resultados experimentais mostram que o método proposto atinge no máximo uma configuração 40.0% menor do que um método anterior para uma aplicação de processamento de imagem. A exploração do tamanho de grão multicast revela o tamanho de grão efetivo para cada algoritmo. Além disso, uma vez que tanto o consumo de energia dinâmico do controlador de configuração como o tempo de configuração são melhorados, consegue-se uma redução de 50.1% do consumo de energia para a configuração com uma área de sobrecarga insignificante.
Takuya KOJIMA
Keio University
Hideharu AMANO
Keio University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Takuya KOJIMA, Hideharu AMANO, "A Fine-Grained Multicasting of Configuration Data for Coarse-Grained Reconfigurable Architectures" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 7, pp. 1247-1256, July 2019, doi: 10.1587/transinf.2018EDP7336.
Abstract: A novel configuration data compression technique for coarse-grained reconfigurable architectures (CGRAs) is proposed. Reducing the size of configuration data of CGRAs shortens the reconfiguration time especially when the communication bandwidth between a CGRA and a host CPU is limited. In addition, it saves energy consumption of configuration cache and controller. The proposed technique is based on a multicast configuration technique called RoMultiC, which reduces the configuration time by multicasting the same data to multiple PEs (Processing Elements) with two bit-maps. Scheduling algorithms for an optimizing the order of multicasting have been proposed. However, the multicasting is possible only if each PE has completely the same configuration. In general, configuration data for CGRAs can be divided into some fields like machine code formats of general perpose CPUs. The proposed scheme confines a part of fields for multicasting so that the possibility of multicasting more PEs can be increased. This paper analyzes algorithms to find a configuration pattern which maximizes the number of multicasted PEs. We implemented the proposed scheme to CMA (Cool Mega Array), a straight forward CGRA as a case study. Experimental results show that the proposed method achieves 40.0% smaller configuration than a previous method for an image processing application at maximum. The exploration of the multicasted grain size reveals the effective grain size for each algorithm. Furthermore, since both a dynamic power consumption of the configuration controller and a configuration time are improved, it achieves 50.1% reduction of the energy consumption for the configuration with a negligible area overhead.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDP7336/_p
Copiar
@ARTICLE{e102-d_7_1247,
author={Takuya KOJIMA, Hideharu AMANO, },
journal={IEICE TRANSACTIONS on Information},
title={A Fine-Grained Multicasting of Configuration Data for Coarse-Grained Reconfigurable Architectures},
year={2019},
volume={E102-D},
number={7},
pages={1247-1256},
abstract={A novel configuration data compression technique for coarse-grained reconfigurable architectures (CGRAs) is proposed. Reducing the size of configuration data of CGRAs shortens the reconfiguration time especially when the communication bandwidth between a CGRA and a host CPU is limited. In addition, it saves energy consumption of configuration cache and controller. The proposed technique is based on a multicast configuration technique called RoMultiC, which reduces the configuration time by multicasting the same data to multiple PEs (Processing Elements) with two bit-maps. Scheduling algorithms for an optimizing the order of multicasting have been proposed. However, the multicasting is possible only if each PE has completely the same configuration. In general, configuration data for CGRAs can be divided into some fields like machine code formats of general perpose CPUs. The proposed scheme confines a part of fields for multicasting so that the possibility of multicasting more PEs can be increased. This paper analyzes algorithms to find a configuration pattern which maximizes the number of multicasted PEs. We implemented the proposed scheme to CMA (Cool Mega Array), a straight forward CGRA as a case study. Experimental results show that the proposed method achieves 40.0% smaller configuration than a previous method for an image processing application at maximum. The exploration of the multicasted grain size reveals the effective grain size for each algorithm. Furthermore, since both a dynamic power consumption of the configuration controller and a configuration time are improved, it achieves 50.1% reduction of the energy consumption for the configuration with a negligible area overhead.},
keywords={},
doi={10.1587/transinf.2018EDP7336},
ISSN={1745-1361},
month={July},}
Copiar
TY - JOUR
TI - A Fine-Grained Multicasting of Configuration Data for Coarse-Grained Reconfigurable Architectures
T2 - IEICE TRANSACTIONS on Information
SP - 1247
EP - 1256
AU - Takuya KOJIMA
AU - Hideharu AMANO
PY - 2019
DO - 10.1587/transinf.2018EDP7336
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2019
AB - A novel configuration data compression technique for coarse-grained reconfigurable architectures (CGRAs) is proposed. Reducing the size of configuration data of CGRAs shortens the reconfiguration time especially when the communication bandwidth between a CGRA and a host CPU is limited. In addition, it saves energy consumption of configuration cache and controller. The proposed technique is based on a multicast configuration technique called RoMultiC, which reduces the configuration time by multicasting the same data to multiple PEs (Processing Elements) with two bit-maps. Scheduling algorithms for an optimizing the order of multicasting have been proposed. However, the multicasting is possible only if each PE has completely the same configuration. In general, configuration data for CGRAs can be divided into some fields like machine code formats of general perpose CPUs. The proposed scheme confines a part of fields for multicasting so that the possibility of multicasting more PEs can be increased. This paper analyzes algorithms to find a configuration pattern which maximizes the number of multicasted PEs. We implemented the proposed scheme to CMA (Cool Mega Array), a straight forward CGRA as a case study. Experimental results show that the proposed method achieves 40.0% smaller configuration than a previous method for an image processing application at maximum. The exploration of the multicasted grain size reveals the effective grain size for each algorithm. Furthermore, since both a dynamic power consumption of the configuration controller and a configuration time are improved, it achieves 50.1% reduction of the energy consumption for the configuration with a negligible area overhead.
ER -