The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
No âmbito da extração de recursos tradicional baseada no espectro de potência, a fim de extrair informações mais discriminativas para detecção de ataques de reprodução, este artigo propõe um recurso que faz uso de rede neural profunda para descrever a relação não linear entre o espectro de potência e as informações discriminativas. Ou seja, coeficientes profundos de Q constante (CQDC). Ele se baseia na transformada Q constante, na rede neural profunda e na transformada discreta de cosseno. Em que a transformada Q constante é usada para converter o sinal do domínio do tempo para o domínio da frequência porque é uma transformação de longo prazo que pode fornecer mais detalhes de frequência, a rede neural profunda é usada para extrair mais informações discriminativas para discriminar a fala de reprodução de fala genuína e transformação discreta de cosseno são usadas para descorrelacionar entre as dimensões do recurso. ASVspoof 2017 corpus versão 2.0 é usado para avaliar o desempenho do CQDC. Os resultados experimentais mostram que o CQDC supera o espectro de potência existente obtido a partir de recursos baseados na transformada Q constante, e o erro igual pode ser reduzido de 19.18% para 51.56%. Além disso, descobrimos que as informações discriminativas do CQDC estão ocultas em todos os compartimentos de frequência, o que é diferente dos recursos comumente usados.
Jichen YANG
National University of Singapore
Longting XU
Donghua University
Bo REN
Microsoft Search Technology Center Asia
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Jichen YANG, Longting XU, Bo REN, "Constant-Q Deep Coefficients for Playback Attack Detection" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 2, pp. 464-468, February 2020, doi: 10.1587/transinf.2019EDL8115.
Abstract: Under the framework of traditional power spectrum based feature extraction, in order to extract more discriminative information for playback attack detection, this paper proposes a feature by making use of deep neural network to describe the nonlinear relationship between power spectrum and discriminative information. Namely, constant-Q deep coefficients (CQDC). It relies on constant-Q transform, deep neural network and discrete cosine transform. In which, constant-Q transform is used to convert signal from the time domain into the frequency domain because it is a long-term transform that can provide more frequency detail, deep neural network is used to extract more discriminative information to discriminate playback speech from genuine speech and discrete cosine transform is used to decorrelate among the feature dimensions. ASVspoof 2017 corpus version 2.0 is used to evaluate the performance of CQDC. The experimental results show that CQDC outperforms the existing power spectrum obtained from constant-Q transform based features, and equal error can reduce from 19.18% to 51.56%. In addition, we found that discriminative information of CQDC hides in all frequency bins, which is different from commonly used features.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDL8115/_p
Copiar
@ARTICLE{e103-d_2_464,
author={Jichen YANG, Longting XU, Bo REN, },
journal={IEICE TRANSACTIONS on Information},
title={Constant-Q Deep Coefficients for Playback Attack Detection},
year={2020},
volume={E103-D},
number={2},
pages={464-468},
abstract={Under the framework of traditional power spectrum based feature extraction, in order to extract more discriminative information for playback attack detection, this paper proposes a feature by making use of deep neural network to describe the nonlinear relationship between power spectrum and discriminative information. Namely, constant-Q deep coefficients (CQDC). It relies on constant-Q transform, deep neural network and discrete cosine transform. In which, constant-Q transform is used to convert signal from the time domain into the frequency domain because it is a long-term transform that can provide more frequency detail, deep neural network is used to extract more discriminative information to discriminate playback speech from genuine speech and discrete cosine transform is used to decorrelate among the feature dimensions. ASVspoof 2017 corpus version 2.0 is used to evaluate the performance of CQDC. The experimental results show that CQDC outperforms the existing power spectrum obtained from constant-Q transform based features, and equal error can reduce from 19.18% to 51.56%. In addition, we found that discriminative information of CQDC hides in all frequency bins, which is different from commonly used features.},
keywords={},
doi={10.1587/transinf.2019EDL8115},
ISSN={1745-1361},
month={February},}
Copiar
TY - JOUR
TI - Constant-Q Deep Coefficients for Playback Attack Detection
T2 - IEICE TRANSACTIONS on Information
SP - 464
EP - 468
AU - Jichen YANG
AU - Longting XU
AU - Bo REN
PY - 2020
DO - 10.1587/transinf.2019EDL8115
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2020
AB - Under the framework of traditional power spectrum based feature extraction, in order to extract more discriminative information for playback attack detection, this paper proposes a feature by making use of deep neural network to describe the nonlinear relationship between power spectrum and discriminative information. Namely, constant-Q deep coefficients (CQDC). It relies on constant-Q transform, deep neural network and discrete cosine transform. In which, constant-Q transform is used to convert signal from the time domain into the frequency domain because it is a long-term transform that can provide more frequency detail, deep neural network is used to extract more discriminative information to discriminate playback speech from genuine speech and discrete cosine transform is used to decorrelate among the feature dimensions. ASVspoof 2017 corpus version 2.0 is used to evaluate the performance of CQDC. The experimental results show that CQDC outperforms the existing power spectrum obtained from constant-Q transform based features, and equal error can reduce from 19.18% to 51.56%. In addition, we found that discriminative information of CQDC hides in all frequency bins, which is different from commonly used features.
ER -