The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Este artigo descreve uma nova abordagem para o controle flexível das características do alto-falante usando representação tensorial de múltiplos modelos de mistura gaussiana (GMM). Nos estudos de conversão de voz, a realização da conversão de/para a voz de um locutor arbitrário é um dos objetivos importantes. Para tanto, foi proposta a conversão de voz própria (EVC) baseada em um GMM de voz própria (EV-GMM). No EVC, um espaço de alto-falante é construído com base em supervetores GMM que são vetores de alta dimensão derivados da concatenação dos vetores médios de cada um dos GMMs de alto-falante. No espaço de alto-falantes, cada alto-falante é representado por um pequeno número de parâmetros de peso de supervetores próprios. Neste artigo, revisitamos a construção do espaço do falante introduzindo a análise fatorial tensorial do conjunto de dados de treinamento. Na nossa abordagem, cada alto-falante é representado como uma matriz cuja linha e coluna correspondem respectivamente à dimensão do vetor médio e à componente gaussiana. O espaço do alto-falante é derivado da análise fatorial tensorial do conjunto das matrizes. Nossa abordagem pode resolver um problema inerente à representação supervetorial e melhora o desempenho da conversão de voz. Além disso, neste artigo, também são investigados os efeitos do treinamento adaptativo do falante antes da fatoração. Resultados experimentais de conversão de voz um para muitos demonstram a eficácia da abordagem proposta.
Daisuke SAITO
The University of Tokyo
Nobuaki MINEMATSU
The University of Tokyo
Keikichi HIROSE
The University of Tokyo
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Daisuke SAITO, Nobuaki MINEMATSU, Keikichi HIROSE, "Tensor Factor Analysis for Arbitrary Speaker Conversion" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 6, pp. 1395-1405, June 2020, doi: 10.1587/transinf.2019EDP7166.
Abstract: This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7166/_p
Copiar
@ARTICLE{e103-d_6_1395,
author={Daisuke SAITO, Nobuaki MINEMATSU, Keikichi HIROSE, },
journal={IEICE TRANSACTIONS on Information},
title={Tensor Factor Analysis for Arbitrary Speaker Conversion},
year={2020},
volume={E103-D},
number={6},
pages={1395-1405},
abstract={This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.},
keywords={},
doi={10.1587/transinf.2019EDP7166},
ISSN={1745-1361},
month={June},}
Copiar
TY - JOUR
TI - Tensor Factor Analysis for Arbitrary Speaker Conversion
T2 - IEICE TRANSACTIONS on Information
SP - 1395
EP - 1405
AU - Daisuke SAITO
AU - Nobuaki MINEMATSU
AU - Keikichi HIROSE
PY - 2020
DO - 10.1587/transinf.2019EDP7166
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2020
AB - This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
ER -