The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Desenvolvemos um sistema de conversão de voz própria (EVC) um para muitos que nos permite converter a voz de um locutor de origem única na voz de um locutor alvo arbitrário usando um modelo de mistura gaussiana de voz própria (EV-GMM). Este sistema é capaz de construir efetivamente um modelo de conversão para um locutor alvo arbitrário, adaptando o EV-GMM usando apenas uma pequena quantidade de dados de fala proferidos pelo locutor alvo de uma maneira independente de texto. Contudo, o desempenho de conversão ainda é insuficiente pelas seguintes razões: 1) o sinal de excitação não é modelado com precisão; 2) a suavização excessiva do espectro convertido causa sons abafados na fala convertida; e 3) o modelo de conversão é afetado por variações acústicas redundantes entre muitos alto-falantes alvo pré-armazenados usados para construir o EV-GMM. Para resolver esses problemas, aplicamos as seguintes técnicas promissoras ao EVC um-para-muitos: 1) excitação mista; 2) um algoritmo de conversão considerando a variância global; e 3) treinamento adaptativo do EV-GMM. Os resultados experimentais demonstram que o desempenho de conversão do EVC um para muitos é significativamente melhorado com a integração de todas essas técnicas no sistema EVC um para muitos.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Yamato OHTANI, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO, "Improvements of the One-to-Many Eigenvoice Conversion System" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 9, pp. 2491-2499, September 2010, doi: 10.1587/transinf.E93.D.2491.
Abstract: We have developed a one-to-many eigenvoice conversion (EVC) system that allows us to convert a single source speaker's voice into an arbitrary target speaker's voice using an eigenvoice Gaussian mixture model (EV-GMM). This system is capable of effectively building a conversion model for an arbitrary target speaker by adapting the EV-GMM using only a small amount of speech data uttered by the target speaker in a text-independent manner. However, the conversion performance is still insufficient for the following reasons: 1) the excitation signal is not precisely modeled; 2) the oversmoothing of the converted spectrum causes muffled sounds in converted speech; and 3) the conversion model is affected by redundant acoustic variations among a lot of pre-stored target speakers used for building the EV-GMM. In order to address these problems, we apply the following promising techniques to one-to-many EVC: 1) mixed excitation; 2) a conversion algorithm considering global variance; and 3) adaptive training of the EV-GMM. The experimental results demonstrate that the conversion performance of one-to-many EVC is significantly improved by integrating all of these techniques into the one-to-many EVC system.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.2491/_p
Copiar
@ARTICLE{e93-d_9_2491,
author={Yamato OHTANI, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO, },
journal={IEICE TRANSACTIONS on Information},
title={Improvements of the One-to-Many Eigenvoice Conversion System},
year={2010},
volume={E93-D},
number={9},
pages={2491-2499},
abstract={We have developed a one-to-many eigenvoice conversion (EVC) system that allows us to convert a single source speaker's voice into an arbitrary target speaker's voice using an eigenvoice Gaussian mixture model (EV-GMM). This system is capable of effectively building a conversion model for an arbitrary target speaker by adapting the EV-GMM using only a small amount of speech data uttered by the target speaker in a text-independent manner. However, the conversion performance is still insufficient for the following reasons: 1) the excitation signal is not precisely modeled; 2) the oversmoothing of the converted spectrum causes muffled sounds in converted speech; and 3) the conversion model is affected by redundant acoustic variations among a lot of pre-stored target speakers used for building the EV-GMM. In order to address these problems, we apply the following promising techniques to one-to-many EVC: 1) mixed excitation; 2) a conversion algorithm considering global variance; and 3) adaptive training of the EV-GMM. The experimental results demonstrate that the conversion performance of one-to-many EVC is significantly improved by integrating all of these techniques into the one-to-many EVC system.},
keywords={},
doi={10.1587/transinf.E93.D.2491},
ISSN={1745-1361},
month={September},}
Copiar
TY - JOUR
TI - Improvements of the One-to-Many Eigenvoice Conversion System
T2 - IEICE TRANSACTIONS on Information
SP - 2491
EP - 2499
AU - Yamato OHTANI
AU - Tomoki TODA
AU - Hiroshi SARUWATARI
AU - Kiyohiro SHIKANO
PY - 2010
DO - 10.1587/transinf.E93.D.2491
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2010
AB - We have developed a one-to-many eigenvoice conversion (EVC) system that allows us to convert a single source speaker's voice into an arbitrary target speaker's voice using an eigenvoice Gaussian mixture model (EV-GMM). This system is capable of effectively building a conversion model for an arbitrary target speaker by adapting the EV-GMM using only a small amount of speech data uttered by the target speaker in a text-independent manner. However, the conversion performance is still insufficient for the following reasons: 1) the excitation signal is not precisely modeled; 2) the oversmoothing of the converted spectrum causes muffled sounds in converted speech; and 3) the conversion model is affected by redundant acoustic variations among a lot of pre-stored target speakers used for building the EV-GMM. In order to address these problems, we apply the following promising techniques to one-to-many EVC: 1) mixed excitation; 2) a conversion algorithm considering global variance; and 3) adaptive training of the EV-GMM. The experimental results demonstrate that the conversion performance of one-to-many EVC is significantly improved by integrating all of these techniques into the one-to-many EVC system.
ER -