The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Muitos provedores de serviços em nuvem empregam aceleradores de hardware especializados, chamados unidades de processamento neural (NPUs), para acelerar redes neurais profundas (DNNs). Um escalonador NPU é responsável por agendar as solicitações de entrada do usuário e é necessário para satisfazer os dois objetivos de otimização, muitas vezes conflitantes: maximizar o rendimento do sistema e satisfazer as restrições de qualidade de serviço (QoS) (por exemplo, prazos) de solicitações individuais. Nós propomos Tecelão de camadas+, um agendador DNN em camadas de baixo custo para NPUs, que fornece alto rendimento do sistema e violações mínimas de QoS. Para um cenário de serviço baseado no benchmark de inferência MLPerf padrão do setor, Tecelão de camadas+ melhora significativamente o rendimento do sistema em até 266.7% em relação ao agendador de linha de base que atende um DNN por vez.
Young H. OH
Sungkyunkwan University
Yunho JIN
Seoul National University
Tae Jun HAM
Seoul National University
Jae W. LEE
Seoul National University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Young H. OH, Yunho JIN, Tae Jun HAM, Jae W. LEE, "Layerweaver+: A QoS-Aware Layer-Wise DNN Scheduler for Multi-Tenant Neural Processing Units" in IEICE TRANSACTIONS on Information,
vol. E105-D, no. 2, pp. 427-431, February 2022, doi: 10.1587/transinf.2021EDL8084.
Abstract: Many cloud service providers employ specialized hardware accelerators, called neural processing units (NPUs), to accelerate deep neural networks (DNNs). An NPU scheduler is responsible for scheduling incoming user requests and required to satisfy the two, often conflicting, optimization goals: maximizing system throughput and satisfying quality-of-service (QoS) constraints (e.g., deadlines) of individual requests. We propose Layerweaver+, a low-cost layer-wise DNN scheduler for NPUs, which provides both high system throughput and minimal QoS violations. For a serving scenario based on the industry-standard MLPerf inference benchmark, Layerweaver+ significantly improves the system throughput by up to 266.7% over the baseline scheduler serving one DNN at a time.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDL8084/_p
Copiar
@ARTICLE{e105-d_2_427,
author={Young H. OH, Yunho JIN, Tae Jun HAM, Jae W. LEE, },
journal={IEICE TRANSACTIONS on Information},
title={Layerweaver+: A QoS-Aware Layer-Wise DNN Scheduler for Multi-Tenant Neural Processing Units},
year={2022},
volume={E105-D},
number={2},
pages={427-431},
abstract={Many cloud service providers employ specialized hardware accelerators, called neural processing units (NPUs), to accelerate deep neural networks (DNNs). An NPU scheduler is responsible for scheduling incoming user requests and required to satisfy the two, often conflicting, optimization goals: maximizing system throughput and satisfying quality-of-service (QoS) constraints (e.g., deadlines) of individual requests. We propose Layerweaver+, a low-cost layer-wise DNN scheduler for NPUs, which provides both high system throughput and minimal QoS violations. For a serving scenario based on the industry-standard MLPerf inference benchmark, Layerweaver+ significantly improves the system throughput by up to 266.7% over the baseline scheduler serving one DNN at a time.},
keywords={},
doi={10.1587/transinf.2021EDL8084},
ISSN={1745-1361},
month={February},}
Copiar
TY - JOUR
TI - Layerweaver+: A QoS-Aware Layer-Wise DNN Scheduler for Multi-Tenant Neural Processing Units
T2 - IEICE TRANSACTIONS on Information
SP - 427
EP - 431
AU - Young H. OH
AU - Yunho JIN
AU - Tae Jun HAM
AU - Jae W. LEE
PY - 2022
DO - 10.1587/transinf.2021EDL8084
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2022
AB - Many cloud service providers employ specialized hardware accelerators, called neural processing units (NPUs), to accelerate deep neural networks (DNNs). An NPU scheduler is responsible for scheduling incoming user requests and required to satisfy the two, often conflicting, optimization goals: maximizing system throughput and satisfying quality-of-service (QoS) constraints (e.g., deadlines) of individual requests. We propose Layerweaver+, a low-cost layer-wise DNN scheduler for NPUs, which provides both high system throughput and minimal QoS violations. For a serving scenario based on the industry-standard MLPerf inference benchmark, Layerweaver+ significantly improves the system throughput by up to 266.7% over the baseline scheduler serving one DNN at a time.
ER -