The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Como muitos processadores, o GPGPU sofre de barreira de memória. A solução tradicional para esse problema é usar escalonadores eficientes para ocultar a longa latência de acesso à memória ou usar mecanismo de pré-busca de dados para reduzir a latência causada pela transferência de dados. Neste artigo, estudamos o estágio de busca de instruções do pipeline da GPU e analisamos a relação entre a capacidade do kernel da GPU e a taxa de falta de instruções. Melhoramos o mecanismo de pré-busca da próxima linha para se ajustar ao modelo SIMT da GPU e determinamos os parâmetros ideais do mecanismo de pré-busca na GPU por meio de experimentos. O resultado experimental mostra que o mecanismo de pré-busca pode atingir em média 12.17% de melhoria de desempenho. Comparado com a solução de ampliação do I-Cache, o mecanismo de pré-busca tem as vantagens de mais beneficiários e menor custo.
Jianli CAO
Dalian University of Technology
Zhikui CHEN
Dalian University of Technology
Yuxin WANG
Dalian University of Technology
He GUO
Dalian University of Technology
Pengcheng WANG
Jianghuai College of Ahui University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Jianli CAO, Zhikui CHEN, Yuxin WANG, He GUO, Pengcheng WANG, "Instruction Prefetch for Improving GPGPU Performance" in IEICE TRANSACTIONS on Fundamentals,
vol. E104-A, no. 5, pp. 773-785, May 2021, doi: 10.1587/transfun.2020EAP1105.
Abstract: Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2020EAP1105/_p
Copiar
@ARTICLE{e104-a_5_773,
author={Jianli CAO, Zhikui CHEN, Yuxin WANG, He GUO, Pengcheng WANG, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Instruction Prefetch for Improving GPGPU Performance},
year={2021},
volume={E104-A},
number={5},
pages={773-785},
abstract={Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.},
keywords={},
doi={10.1587/transfun.2020EAP1105},
ISSN={1745-1337},
month={May},}
Copiar
TY - JOUR
TI - Instruction Prefetch for Improving GPGPU Performance
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 773
EP - 785
AU - Jianli CAO
AU - Zhikui CHEN
AU - Yuxin WANG
AU - He GUO
AU - Pengcheng WANG
PY - 2021
DO - 10.1587/transfun.2020EAP1105
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E104-A
IS - 5
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - May 2021
AB - Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
ER -