The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Este artigo apresenta um novo método para modelagem estatística de controle de prosódia em síntese de fala. O método proposto, conhecido como Regressão em Árvore Restrita (CTR), pode fazer uma representação adequada de efeitos complexos de fatores de controle para prosódia com uma quantidade moderada de dados de aprendizagem. Baseia-se em divisões recursivas de espaços de variáveis preditoras e imposição parcial de restrições de independência linear entre variáveis preditoras. Ele incorpora regressões lineares e em árvore com variáveis preditoras categóricas, que têm sido convencionalmente usadas para controle de prosódia, e as estende a modelos mais gerais. Além disso, é apresentada uma função de erro hierárquica para considerar a estrutura hierárquica no controle da prosódia. Este novo método é aplicado à modelagem da duração segmentar da fala. Resultados experimentais mostram que melhores modelos de duração são obtidos usando o método de regressão proposto em comparação com regressões lineares e em árvore usando o mesmo número de parâmetros livres. Também é mostrado que a estrutura hierárquica das durações dos fonemas e das sílabas pode ser representada de forma eficiente usando a função de erro hierárquico.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Naoto IWAHASHI, Yoshinori SAGISAKA, "Statistical Modelling of Speech Segment Duration by Constrained Tree Regression" in IEICE TRANSACTIONS on Information,
vol. E83-D, no. 7, pp. 1550-1559, July 2000, doi: .
Abstract: This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function.
URL: https://global.ieice.org/en_transactions/information/10.1587/e83-d_7_1550/_p
Copiar
@ARTICLE{e83-d_7_1550,
author={Naoto IWAHASHI, Yoshinori SAGISAKA, },
journal={IEICE TRANSACTIONS on Information},
title={Statistical Modelling of Speech Segment Duration by Constrained Tree Regression},
year={2000},
volume={E83-D},
number={7},
pages={1550-1559},
abstract={This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function.},
keywords={},
doi={},
ISSN={},
month={July},}
Copiar
TY - JOUR
TI - Statistical Modelling of Speech Segment Duration by Constrained Tree Regression
T2 - IEICE TRANSACTIONS on Information
SP - 1550
EP - 1559
AU - Naoto IWAHASHI
AU - Yoshinori SAGISAKA
PY - 2000
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E83-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2000
AB - This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function.
ER -