The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Este artigo propõe um método de modelagem i-vetor fonética do locutor para verificação do falante dependente de texto com cadeias de dígitos aleatórios, nas quais as declarações de inscrição e de teste não são da mesma frase. O núcleo do método proposto é fazer uso de informações de alinhamento de dígitos na estrutura de i-vetores. Ao utilizar informações de alinhamento de força, as pontuações de verificação das tentativas de teste podem ser calculadas na situação de frase fixa, na qual os segmentos de fala comparados entre as declarações de inscrição e de teste têm o mesmo conteúdo fonético. Especificamente, as expressões são segmentadas em dígitos e, em seguida, um extrator de vetor i exclusivo com restrição fonética é aplicado para obter representação da variabilidade do locutor e do canal para cada segmento de dígito. A análise discriminante linear probabilística (PLDA) e a norma s são posteriormente usadas para compensação de canal e normalização de pontuação, respectivamente. A pontuação final é obtida combinando as pontuações dos dígitos, que são calculadas pela pontuação de segmentos de dígitos individuais do enunciado do teste em relação aos correspondentes da inscrição. Os resultados experimentais na Parte 3 do banco de dados Robust Speaker Recognition (RSR2015) demonstram que a abordagem proposta supera significativamente o GMM-UBM em 52.3% e 53.5% em relação à taxa de erro igual (EER) para homens e mulheres, respectivamente.
Shengyu YAO
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Ruohua ZHOU
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Pengyuan ZHANG
Chinese Academy of Sciences,University of Chinese Academy of Sciences
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Shengyu YAO, Ruohua ZHOU, Pengyuan ZHANG, "Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 2, pp. 346-354, February 2019, doi: 10.1587/transinf.2018EDP7310.
Abstract: This paper proposes a speaker-phonetic i-vector modeling method for text-dependent speaker verification with random digit strings, in which enrollment and test utterances are not of the same phrase. The core of the proposed method is making use of digit alignment information in i-vector framework. By utilizing force alignment information, verification scores of the testing trials can be computed in the fixed-phrase situation, in which the compared speech segments between the enrollment and test utterances are of the same phonetic content. Specifically, utterances are segmented into digits, then a unique phonetically-constrained i-vector extractor is applied to obtain speaker and channel variability representation for every digit segment. Probabilistic linear discriminant analysis (PLDA) and s-norm are subsequently used for channel compensation and score normalization respectively. The final score is obtained by combing the digit scores, which are computed by scoring individual digit segments of the test utterance against the corresponding ones of the enrollment. Experimental results on the Part 3 of Robust Speaker Recognition (RSR2015) database demonstrate that the proposed approach significantly outperforms GMM-UBM by 52.3% and 53.5% relative in equal error rate (EER) for male and female respectively.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDP7310/_p
Copiar
@ARTICLE{e102-d_2_346,
author={Shengyu YAO, Ruohua ZHOU, Pengyuan ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings},
year={2019},
volume={E102-D},
number={2},
pages={346-354},
abstract={This paper proposes a speaker-phonetic i-vector modeling method for text-dependent speaker verification with random digit strings, in which enrollment and test utterances are not of the same phrase. The core of the proposed method is making use of digit alignment information in i-vector framework. By utilizing force alignment information, verification scores of the testing trials can be computed in the fixed-phrase situation, in which the compared speech segments between the enrollment and test utterances are of the same phonetic content. Specifically, utterances are segmented into digits, then a unique phonetically-constrained i-vector extractor is applied to obtain speaker and channel variability representation for every digit segment. Probabilistic linear discriminant analysis (PLDA) and s-norm are subsequently used for channel compensation and score normalization respectively. The final score is obtained by combing the digit scores, which are computed by scoring individual digit segments of the test utterance against the corresponding ones of the enrollment. Experimental results on the Part 3 of Robust Speaker Recognition (RSR2015) database demonstrate that the proposed approach significantly outperforms GMM-UBM by 52.3% and 53.5% relative in equal error rate (EER) for male and female respectively.},
keywords={},
doi={10.1587/transinf.2018EDP7310},
ISSN={1745-1361},
month={February},}
Copiar
TY - JOUR
TI - Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings
T2 - IEICE TRANSACTIONS on Information
SP - 346
EP - 354
AU - Shengyu YAO
AU - Ruohua ZHOU
AU - Pengyuan ZHANG
PY - 2019
DO - 10.1587/transinf.2018EDP7310
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2019
AB - This paper proposes a speaker-phonetic i-vector modeling method for text-dependent speaker verification with random digit strings, in which enrollment and test utterances are not of the same phrase. The core of the proposed method is making use of digit alignment information in i-vector framework. By utilizing force alignment information, verification scores of the testing trials can be computed in the fixed-phrase situation, in which the compared speech segments between the enrollment and test utterances are of the same phonetic content. Specifically, utterances are segmented into digits, then a unique phonetically-constrained i-vector extractor is applied to obtain speaker and channel variability representation for every digit segment. Probabilistic linear discriminant analysis (PLDA) and s-norm are subsequently used for channel compensation and score normalization respectively. The final score is obtained by combing the digit scores, which are computed by scoring individual digit segments of the test utterance against the corresponding ones of the enrollment. Experimental results on the Part 3 of Robust Speaker Recognition (RSR2015) database demonstrate that the proposed approach significantly outperforms GMM-UBM by 52.3% and 53.5% relative in equal error rate (EER) for male and female respectively.
ER -