The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Este artigo propõe uma estrutura baseada em rede neural profunda (DNN) para resolver o problema de quantização vetorial (VQ) para dados de alta dimensão. O principal desafio da aplicação de DNN ao VQ é como reduzir o erro de codificação binária do autocodificador quando a distribuição das unidades de codificação está longe de ser binária. Para resolver este problema, três métodos de ajuste fino foram adotados: 1) adicionar ruído gaussiano à entrada da camada de codificação, 2) forçar a saída da camada de codificação a ser binária, 3) adicionar um termo de penalidade não binário para a função de perda. Esses métodos de ajuste fino foram extensivamente avaliados na quantização de espectros de magnitude de fala. Os resultados demonstraram que cada um dos métodos é útil para melhorar o desempenho da codificação. Quando implementada para quantizar espectros de fala de 968 dimensões usando apenas 18 bits, a estrutura VQ baseada em DNN alcançou um PESQ médio de cerca de 2.09, que está muito além da capacidade dos métodos VQ convencionais.
JianFeng WU
Hangzhou Dianzi University
HuiBin QIN
Hangzhou Dianzi University
YongZhu HUA
Hangzhou Dianzi University
LiHuan SHAO
Hangzhou Dianzi University
Ji HU
Hangzhou Dianzi University
ShengYing YANG
Hangzhou Dianzi University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
JianFeng WU, HuiBin QIN, YongZhu HUA, LiHuan SHAO, Ji HU, ShengYing YANG, "Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 10, pp. 2047-2050, October 2019, doi: 10.1587/transinf.2019EDL8023.
Abstract: This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDL8023/_p
Copiar
@ARTICLE{e102-d_10_2047,
author={JianFeng WU, HuiBin QIN, YongZhu HUA, LiHuan SHAO, Ji HU, ShengYing YANG, },
journal={IEICE TRANSACTIONS on Information},
title={Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network},
year={2019},
volume={E102-D},
number={10},
pages={2047-2050},
abstract={This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.},
keywords={},
doi={10.1587/transinf.2019EDL8023},
ISSN={1745-1361},
month={October},}
Copiar
TY - JOUR
TI - Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network
T2 - IEICE TRANSACTIONS on Information
SP - 2047
EP - 2050
AU - JianFeng WU
AU - HuiBin QIN
AU - YongZhu HUA
AU - LiHuan SHAO
AU - Ji HU
AU - ShengYing YANG
PY - 2019
DO - 10.1587/transinf.2019EDL8023
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2019
AB - This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.
ER -