The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Dispositivos emergentes de memória não volátil endereçáveis por bytes atraem muita atenção. Uma memória principal não volátil (NVMM) integrada permite maior tamanho de memória e menor consumo de energia do que uma memória principal DRAM tradicional. Para utilizar totalmente um NVMM, tanto o software quanto o hardware devem ser otimizados cooperativamente. Simultaneamente, mesmo focando em um módulo de memória, sua microarquitetura ainda está sendo desenvolvida, embora já estejam no mercado módulos reais de memória não volátil, como a memória persistente Intel Optane DC (DCPMM). Olhando para os ambientes de avaliação NVMM existentes, os simuladores de software podem avaliar várias microarquiteturas com seu longo tempo de simulação. Os emuladores podem avaliar todo o sistema rapidamente com menos flexibilidade em sua configuração do que os simuladores. Assim, um emulador NVMM que possa realizar uma avaliação flexível e rápida do sistema ainda tem um papel importante na exploração do sistema ideal. Neste artigo, apresentamos um emulador NVMM para sistemas embarcados e exploramos uma direção de técnicas de otimização para NVMMs usando-o. Ele é implementado em uma placa SoC-FPGA empregando três modelos de comportamento NVMM: grão grosso, grão fino e baseado em DCPMM. Os modelos grosseiros e finos permitem avaliações de desempenho do NVMM com base em extensões do comportamento tradicional da DRAM. O modelo baseado em DCPMM emula o comportamento de um DCPMM real. Todo o ambiente de avaliação também é fornecido, incluindo modificações no kernel Linux e diversas funções de tempo de execução. Primeiro validamos o emulador desenvolvido com um emulador NVMM existente, um simulador NVMM com precisão de ciclo e um DCPMM real. Em seguida, as diferenças de comportamento do programa entre os três modelos são avaliadas com programas SPEC CPU. Como resultado, o modelo detalhado revela que o tempo de execução do programa é afetado pela frequência das solicitações de memória NVMM, e não pela taxa de acertos do cache. Comparando com o modelo de granulação fina e o modelo de granulação grossa sob a condição de latência total de gravação mais longa do primeiro do que o do último, o primeiro mostra menor tempo de execução para quatro dos quatorze programas do que o último devido ao paralelismo no nível do banco e ao localidade de acesso ao buffer de linha explorada pelo modelo anterior.
Yu OMORI
Waseda University
Keiji KIMURA
Waseda University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Yu OMORI, Keiji KIMURA, "Non-Volatile Main Memory Emulator for Embedded Systems Employing Three NVMM Behaviour Models" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 5, pp. 697-708, May 2021, doi: 10.1587/transinf.2020EDP7092.
Abstract: Emerging byte-addressable non-volatile memory devices attract much attention. A non-volatile main memory (NVMM) built on them enables larger memory size and lower power consumption than a traditional DRAM main memory. To fully utilize an NVMM, both software and hardware must be cooperatively optimized. Simultaneously, even focusing on a memory module, its micro architecture is still being developed though real non-volatile memory modules, such as Intel Optane DC persistent memory (DCPMM), have been on the market. Looking at existing NVMM evaluation environments, software simulators can evaluate various micro architectures with their long simulation time. Emulators can evaluate the whole system fast with less flexibility in their configuration than simulators. Thus, an NVMM emulator that can realize flexible and fast system evaluation still has an important role to explore the optimal system. In this paper, we introduce an NVMM emulator for embedded systems and explore a direction of optimization techniques for NVMMs by using it. It is implemented on an SoC-FPGA board employing three NVMM behaviour models: coarse-grain, fine-grain and DCPMM-based. The coarse and fine models enable NVMM performance evaluations based on extensions of traditional DRAM behaviour. The DCPMM-based model emulates the behaviour of a real DCPMM. Whole evaluation environment is also provided including Linux kernel modifications and several runtime functions. We first validate the developed emulator with an existing NVMM emulator, a cycle-accurate NVMM simulator and a real DCPMM. Then, the program behavior differences among three models are evaluated with SPEC CPU programs. As a result, the fine-grain model reveals the program execution time is affected by the frequency of NVMM memory requests rather than the cache hit ratio. Comparing with the fine-grain model and the coarse-grain model under the condition of the former's longer total write latency than the latter's, the former shows lower execution time for four of fourteen programs than the latter because of the bank-level parallelism and the row-buffer access locality exploited by the former model.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDP7092/_p
Copiar
@ARTICLE{e104-d_5_697,
author={Yu OMORI, Keiji KIMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Non-Volatile Main Memory Emulator for Embedded Systems Employing Three NVMM Behaviour Models},
year={2021},
volume={E104-D},
number={5},
pages={697-708},
abstract={Emerging byte-addressable non-volatile memory devices attract much attention. A non-volatile main memory (NVMM) built on them enables larger memory size and lower power consumption than a traditional DRAM main memory. To fully utilize an NVMM, both software and hardware must be cooperatively optimized. Simultaneously, even focusing on a memory module, its micro architecture is still being developed though real non-volatile memory modules, such as Intel Optane DC persistent memory (DCPMM), have been on the market. Looking at existing NVMM evaluation environments, software simulators can evaluate various micro architectures with their long simulation time. Emulators can evaluate the whole system fast with less flexibility in their configuration than simulators. Thus, an NVMM emulator that can realize flexible and fast system evaluation still has an important role to explore the optimal system. In this paper, we introduce an NVMM emulator for embedded systems and explore a direction of optimization techniques for NVMMs by using it. It is implemented on an SoC-FPGA board employing three NVMM behaviour models: coarse-grain, fine-grain and DCPMM-based. The coarse and fine models enable NVMM performance evaluations based on extensions of traditional DRAM behaviour. The DCPMM-based model emulates the behaviour of a real DCPMM. Whole evaluation environment is also provided including Linux kernel modifications and several runtime functions. We first validate the developed emulator with an existing NVMM emulator, a cycle-accurate NVMM simulator and a real DCPMM. Then, the program behavior differences among three models are evaluated with SPEC CPU programs. As a result, the fine-grain model reveals the program execution time is affected by the frequency of NVMM memory requests rather than the cache hit ratio. Comparing with the fine-grain model and the coarse-grain model under the condition of the former's longer total write latency than the latter's, the former shows lower execution time for four of fourteen programs than the latter because of the bank-level parallelism and the row-buffer access locality exploited by the former model.},
keywords={},
doi={10.1587/transinf.2020EDP7092},
ISSN={1745-1361},
month={May},}
Copiar
TY - JOUR
TI - Non-Volatile Main Memory Emulator for Embedded Systems Employing Three NVMM Behaviour Models
T2 - IEICE TRANSACTIONS on Information
SP - 697
EP - 708
AU - Yu OMORI
AU - Keiji KIMURA
PY - 2021
DO - 10.1587/transinf.2020EDP7092
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2021
AB - Emerging byte-addressable non-volatile memory devices attract much attention. A non-volatile main memory (NVMM) built on them enables larger memory size and lower power consumption than a traditional DRAM main memory. To fully utilize an NVMM, both software and hardware must be cooperatively optimized. Simultaneously, even focusing on a memory module, its micro architecture is still being developed though real non-volatile memory modules, such as Intel Optane DC persistent memory (DCPMM), have been on the market. Looking at existing NVMM evaluation environments, software simulators can evaluate various micro architectures with their long simulation time. Emulators can evaluate the whole system fast with less flexibility in their configuration than simulators. Thus, an NVMM emulator that can realize flexible and fast system evaluation still has an important role to explore the optimal system. In this paper, we introduce an NVMM emulator for embedded systems and explore a direction of optimization techniques for NVMMs by using it. It is implemented on an SoC-FPGA board employing three NVMM behaviour models: coarse-grain, fine-grain and DCPMM-based. The coarse and fine models enable NVMM performance evaluations based on extensions of traditional DRAM behaviour. The DCPMM-based model emulates the behaviour of a real DCPMM. Whole evaluation environment is also provided including Linux kernel modifications and several runtime functions. We first validate the developed emulator with an existing NVMM emulator, a cycle-accurate NVMM simulator and a real DCPMM. Then, the program behavior differences among three models are evaluated with SPEC CPU programs. As a result, the fine-grain model reveals the program execution time is affected by the frequency of NVMM memory requests rather than the cache hit ratio. Comparing with the fine-grain model and the coarse-grain model under the condition of the former's longer total write latency than the latter's, the former shows lower execution time for four of fourteen programs than the latter because of the bank-level parallelism and the row-buffer access locality exploited by the former model.
ER -