The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
À medida que a eficiência energética se tornou uma importante restrição ou objetivo de projeto, arquiteturas heterogêneas de muitos núcleos surgiram como plataformas-alvo convencionais, não apenas em sistemas de servidores, mas também em sistemas embarcados. Aceleradores Manycore, como GPUs, também estão se tornando populares em domínios incorporados, bem como em núcleos de CPU heterogêneos. No entanto, como o número de núcleos em uma GPU incorporada é muito menor do que o de uma GPU de servidor, é importante utilizar CPUs e GPUs multi-core heterogêneas para atingir o rendimento desejado com o consumo mínimo de energia. Neste artigo, apresentamos um estudo de caso de mapeamento de detecção facial baseada em LBP em uma plataforma embarcada heterogênea CPU-GPU recente, que explora tanto o paralelismo de tarefas quanto o paralelismo de dados para alcançar a máxima eficiência energética com uma restrição de tempo real. Primeiro apresentamos a técnica de paralelização de cada tarefa para a execução da GPU, depois propomos modelos de desempenho e energia para execuções paralelas de tarefas e paralelas de dados em processadores heterogêneos, que são usados na exploração do espaço de design para o mapeamento ideal. O espaço de design é enorme, uma vez que são consideradas não apenas a heterogeneidade do processador, como CPU-GPU e big.LITTLE, mas também várias taxas de particionamento de dados para a execução paralela de dados nesses processadores heterogêneos. Em nosso estudo de caso de detecção facial LBP no Exynos 5422, o erro de estimativa dos modelos de desempenho e energia propostos foi em média -2.19% e -3.67%, respectivamente. Ao encontrar sistematicamente os mapeamentos ideais com os modelos propostos, poderíamos alcançar 28.6% menos consumo de energia em comparação com o mapeamento manual, ao mesmo tempo que cumprimos a restrição em tempo real.
Chanyoung OH
University of Seoul
Saehanseul YI
University of Seoul
Youngmin YI
University of Seoul
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Chanyoung OH, Saehanseul YI, Youngmin YI, "Real-Time and Energy-Efficient Face Detection on CPU-GPU Heterogeneous Embedded Platforms" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 12, pp. 2878-2888, December 2018, doi: 10.1587/transinf.2018PAP0004.
Abstract: As energy efficiency has become a major design constraint or objective, heterogeneous manycore architectures have emerged as mainstream target platforms not only in server systems but also in embedded systems. Manycore accelerators such as GPUs are getting also popular in embedded domains, as well as the heterogeneous CPU cores. However, as the number of cores in an embedded GPU is far less than that of a server GPU, it is important to utilize both heterogeneous multi-core CPUs and GPUs to achieve the desired throughput with the minimal energy consumption. In this paper, we present a case study of mapping LBP-based face detection onto a recent CPU-GPU heterogeneous embedded platform, which exploits both task parallelism and data parallelism to achieve maximal energy efficiency with a real-time constraint. We first present the parallelization technique of each task for the GPU execution, then we propose performance and energy models for both task-parallel and data-parallel executions on heterogeneous processors, which are used in design space exploration for the optimal mapping. The design space is huge since not only processor heterogeneity such as CPU-GPU and big.LITTLE, but also various data partitioning ratios for the data-parallel execution on these heterogeneous processors are considered. In our case study of LBP face detection on Exynos 5422, the estimation error of the proposed performance and energy models were on average -2.19% and -3.67% respectively. By systematically finding the optimal mappings with the proposed models, we could achieve 28.6% less energy consumption compared to the manual mapping, while still meeting the real-time constraint.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018PAP0004/_p
Copiar
@ARTICLE{e101-d_12_2878,
author={Chanyoung OH, Saehanseul YI, Youngmin YI, },
journal={IEICE TRANSACTIONS on Information},
title={Real-Time and Energy-Efficient Face Detection on CPU-GPU Heterogeneous Embedded Platforms},
year={2018},
volume={E101-D},
number={12},
pages={2878-2888},
abstract={As energy efficiency has become a major design constraint or objective, heterogeneous manycore architectures have emerged as mainstream target platforms not only in server systems but also in embedded systems. Manycore accelerators such as GPUs are getting also popular in embedded domains, as well as the heterogeneous CPU cores. However, as the number of cores in an embedded GPU is far less than that of a server GPU, it is important to utilize both heterogeneous multi-core CPUs and GPUs to achieve the desired throughput with the minimal energy consumption. In this paper, we present a case study of mapping LBP-based face detection onto a recent CPU-GPU heterogeneous embedded platform, which exploits both task parallelism and data parallelism to achieve maximal energy efficiency with a real-time constraint. We first present the parallelization technique of each task for the GPU execution, then we propose performance and energy models for both task-parallel and data-parallel executions on heterogeneous processors, which are used in design space exploration for the optimal mapping. The design space is huge since not only processor heterogeneity such as CPU-GPU and big.LITTLE, but also various data partitioning ratios for the data-parallel execution on these heterogeneous processors are considered. In our case study of LBP face detection on Exynos 5422, the estimation error of the proposed performance and energy models were on average -2.19% and -3.67% respectively. By systematically finding the optimal mappings with the proposed models, we could achieve 28.6% less energy consumption compared to the manual mapping, while still meeting the real-time constraint.},
keywords={},
doi={10.1587/transinf.2018PAP0004},
ISSN={1745-1361},
month={December},}
Copiar
TY - JOUR
TI - Real-Time and Energy-Efficient Face Detection on CPU-GPU Heterogeneous Embedded Platforms
T2 - IEICE TRANSACTIONS on Information
SP - 2878
EP - 2888
AU - Chanyoung OH
AU - Saehanseul YI
AU - Youngmin YI
PY - 2018
DO - 10.1587/transinf.2018PAP0004
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2018
AB - As energy efficiency has become a major design constraint or objective, heterogeneous manycore architectures have emerged as mainstream target platforms not only in server systems but also in embedded systems. Manycore accelerators such as GPUs are getting also popular in embedded domains, as well as the heterogeneous CPU cores. However, as the number of cores in an embedded GPU is far less than that of a server GPU, it is important to utilize both heterogeneous multi-core CPUs and GPUs to achieve the desired throughput with the minimal energy consumption. In this paper, we present a case study of mapping LBP-based face detection onto a recent CPU-GPU heterogeneous embedded platform, which exploits both task parallelism and data parallelism to achieve maximal energy efficiency with a real-time constraint. We first present the parallelization technique of each task for the GPU execution, then we propose performance and energy models for both task-parallel and data-parallel executions on heterogeneous processors, which are used in design space exploration for the optimal mapping. The design space is huge since not only processor heterogeneity such as CPU-GPU and big.LITTLE, but also various data partitioning ratios for the data-parallel execution on these heterogeneous processors are considered. In our case study of LBP face detection on Exynos 5422, the estimation error of the proposed performance and energy models were on average -2.19% and -3.67% respectively. By systematically finding the optimal mappings with the proposed models, we could achieve 28.6% less energy consumption compared to the manual mapping, while still meeting the real-time constraint.
ER -