The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Neste artigo, resolvemos o problema de controle de admissão de chamadas (CAC) e roteamento em uma rede integrada que lida com diversas classes de chamadas de diferentes valores e com diferentes requisitos de recursos. O problema de maximizar a recompensa (ou custo) médio das chamadas admitidas por unidade de tempo é naturalmente formulado como um problema de Processo de Decisão semi-Markov (SMDP), mas é muito complexo para permitir uma solução exata. Assim, neste artigo, um algoritmo de gradiente de política, juntamente com uma abordagem de decomposição, é proposto para encontrar o CAC ideal dinâmico (dependente do estado) e a política de roteamento entre um espaço de política parametrizado. Para implementar esse algoritmo de gradiente, aproximamos o gradiente da recompensa média. Em seguida, apresentamos um algoritmo baseado em simulação para estimar o gradiente aproximado da recompensa média (chamado algoritmo GSMDP), usando apenas um único caminho de amostra da cadeia de Markov subjacente para o SMDP do CAC e problema de roteamento. O algoritmo melhora o desempenho em termos de velocidade de convergência, probabilidade de rejeição, robustez às mudanças nas estatísticas de chegada e uma receita média geral recebida. As simulações experimentais irão comparar o desempenho do nosso método com outros métodos existentes e mostrar a robustez do nosso método.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Ngo Anh VIEN, Nguyen Hoang VIET, SeungGwan LEE, TaeChoong CHUNG, "Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks" in IEICE TRANSACTIONS on Communications,
vol. E92-B, no. 6, pp. 2008-2022, June 2009, doi: 10.1587/transcom.E92.B.2008.
Abstract: In this paper, we solve the call admission control (CAC) and routing problem in an integrated network that handles several classes of calls of different values and with different resource requirements. The problem of maximizing the average reward (or cost) of admitted calls per unit time is naturally formulated as a semi-Markov Decision Process (SMDP) problem, but is too complex to allow for an exact solution. Thus in this paper, a policy gradient algorithm, together with a decomposition approach, is proposed to find the dynamic (state-dependent) optimal CAC and routing policy among a parameterized policy space. To implement that gradient algorithm, we approximate the gradient of the average reward. Then, we present a simulation-based algorithm to estimate the approximate gradient of the average reward (called GSMDP algorithm), using only a single sample path of the underlying Markov chain for the SMDP of CAC and routing problem. The algorithm enhances performance in terms of convergence speed, rejection probability, robustness to the changing arrival statistics and an overall received average revenue. The experimental simulations will compare our method's performance with other existing methods and show the robustness of our method.
URL: https://global.ieice.org/en_transactions/communications/10.1587/transcom.E92.B.2008/_p
Copiar
@ARTICLE{e92-b_6_2008,
author={Ngo Anh VIEN, Nguyen Hoang VIET, SeungGwan LEE, TaeChoong CHUNG, },
journal={IEICE TRANSACTIONS on Communications},
title={Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks},
year={2009},
volume={E92-B},
number={6},
pages={2008-2022},
abstract={In this paper, we solve the call admission control (CAC) and routing problem in an integrated network that handles several classes of calls of different values and with different resource requirements. The problem of maximizing the average reward (or cost) of admitted calls per unit time is naturally formulated as a semi-Markov Decision Process (SMDP) problem, but is too complex to allow for an exact solution. Thus in this paper, a policy gradient algorithm, together with a decomposition approach, is proposed to find the dynamic (state-dependent) optimal CAC and routing policy among a parameterized policy space. To implement that gradient algorithm, we approximate the gradient of the average reward. Then, we present a simulation-based algorithm to estimate the approximate gradient of the average reward (called GSMDP algorithm), using only a single sample path of the underlying Markov chain for the SMDP of CAC and routing problem. The algorithm enhances performance in terms of convergence speed, rejection probability, robustness to the changing arrival statistics and an overall received average revenue. The experimental simulations will compare our method's performance with other existing methods and show the robustness of our method.},
keywords={},
doi={10.1587/transcom.E92.B.2008},
ISSN={1745-1345},
month={June},}
Copiar
TY - JOUR
TI - Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks
T2 - IEICE TRANSACTIONS on Communications
SP - 2008
EP - 2022
AU - Ngo Anh VIEN
AU - Nguyen Hoang VIET
AU - SeungGwan LEE
AU - TaeChoong CHUNG
PY - 2009
DO - 10.1587/transcom.E92.B.2008
JO - IEICE TRANSACTIONS on Communications
SN - 1745-1345
VL - E92-B
IS - 6
JA - IEICE TRANSACTIONS on Communications
Y1 - June 2009
AB - In this paper, we solve the call admission control (CAC) and routing problem in an integrated network that handles several classes of calls of different values and with different resource requirements. The problem of maximizing the average reward (or cost) of admitted calls per unit time is naturally formulated as a semi-Markov Decision Process (SMDP) problem, but is too complex to allow for an exact solution. Thus in this paper, a policy gradient algorithm, together with a decomposition approach, is proposed to find the dynamic (state-dependent) optimal CAC and routing policy among a parameterized policy space. To implement that gradient algorithm, we approximate the gradient of the average reward. Then, we present a simulation-based algorithm to estimate the approximate gradient of the average reward (called GSMDP algorithm), using only a single sample path of the underlying Markov chain for the SMDP of CAC and routing problem. The algorithm enhances performance in terms of convergence speed, rejection probability, robustness to the changing arrival statistics and an overall received average revenue. The experimental simulations will compare our method's performance with other existing methods and show the robustness of our method.
ER -