The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
A estreita limitação da largura de banda de 300-3400 Hz na rede telefónica de comutação pública resulta na deterioração da qualidade da voz. Neste artigo, propomos uma abordagem de extensão de largura de banda artificial que reconstrói a largura de banda inferior faltante de 50-300Hz usando síntese senoidal baseada na localização do primeiro formante. A síntese sinusoidal gera ondas sinusoidais com estrutura harmônica. O método proposto detecta a frequência fundamental usando um método de autocorrelação baseado no algoritmo YIN, onde um processamento de limiar evita a detecção de falsa frequência fundamental em sons surdos. A amplitude das ondas sinusoidais é calculada no domínio do tempo a partir da energia ponderada de 300-600Hz. Neste caso, como a localização do primeiro formante corresponde ao primeiro pico do envelope espectral, reconstruímos a estrutura harmônica para evitar atenuação e superenfatização, aumentando o peso quando a localização do primeiro formante for menor, e vice-versa. Consequentemente, as avaliações subjetivas e objetivas mostram que o método proposto reduz a diferença de qualidade de fala entre o sinal de fala original e o sinal de fala com largura de banda estendida.
Yuya HOSODA
Osaka University
Arata KAWAMURA
Kyoto Sangyo University
Youji IIGUNI
Osaka University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Yuya HOSODA, Arata KAWAMURA, Youji IIGUNI, "Artificial Bandwidth Extension for Lower Bandwidth Using Sinusoidal Synthesis based on First Formant Location" in IEICE TRANSACTIONS on Fundamentals,
vol. E105-A, no. 4, pp. 664-672, April 2022, doi: 10.1587/transfun.2021EAP1044.
Abstract: The narrow bandwidth limitation of 300-3400Hz on the public switching telephone network results in speech quality deterioration. In this paper, we propose an artificial bandwidth extension approach that reconstructs the missing lower bandwidth of 50-300Hz using sinusoidal synthesis based on the first formant location. Sinusoidal synthesis generates sinusoidal waves with a harmonic structure. The proposed method detects the fundamental frequency using an autocorrelation method based on YIN algorithm, where a threshold processing avoids the false fundamental frequency detection on unvoiced sounds. The amplitude of the sinusoidal waves is calculated in the time domain from the weighted energy of 300-600Hz. In this case, since the first formant location corresponds to the first peak of the spectral envelope, we reconstruct the harmonic structure to avoid attenuating and overemphasizing by increasing the weight when the first formant location is lower, and vice versa. Consequently, the subjective and objective evaluations show that the proposed method reduces the speech quality difference between the original speech signal and the bandwidth extended speech signal.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2021EAP1044/_p
Copiar
@ARTICLE{e105-a_4_664,
author={Yuya HOSODA, Arata KAWAMURA, Youji IIGUNI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Artificial Bandwidth Extension for Lower Bandwidth Using Sinusoidal Synthesis based on First Formant Location},
year={2022},
volume={E105-A},
number={4},
pages={664-672},
abstract={The narrow bandwidth limitation of 300-3400Hz on the public switching telephone network results in speech quality deterioration. In this paper, we propose an artificial bandwidth extension approach that reconstructs the missing lower bandwidth of 50-300Hz using sinusoidal synthesis based on the first formant location. Sinusoidal synthesis generates sinusoidal waves with a harmonic structure. The proposed method detects the fundamental frequency using an autocorrelation method based on YIN algorithm, where a threshold processing avoids the false fundamental frequency detection on unvoiced sounds. The amplitude of the sinusoidal waves is calculated in the time domain from the weighted energy of 300-600Hz. In this case, since the first formant location corresponds to the first peak of the spectral envelope, we reconstruct the harmonic structure to avoid attenuating and overemphasizing by increasing the weight when the first formant location is lower, and vice versa. Consequently, the subjective and objective evaluations show that the proposed method reduces the speech quality difference between the original speech signal and the bandwidth extended speech signal.},
keywords={},
doi={10.1587/transfun.2021EAP1044},
ISSN={1745-1337},
month={April},}
Copiar
TY - JOUR
TI - Artificial Bandwidth Extension for Lower Bandwidth Using Sinusoidal Synthesis based on First Formant Location
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 664
EP - 672
AU - Yuya HOSODA
AU - Arata KAWAMURA
AU - Youji IIGUNI
PY - 2022
DO - 10.1587/transfun.2021EAP1044
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E105-A
IS - 4
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - April 2022
AB - The narrow bandwidth limitation of 300-3400Hz on the public switching telephone network results in speech quality deterioration. In this paper, we propose an artificial bandwidth extension approach that reconstructs the missing lower bandwidth of 50-300Hz using sinusoidal synthesis based on the first formant location. Sinusoidal synthesis generates sinusoidal waves with a harmonic structure. The proposed method detects the fundamental frequency using an autocorrelation method based on YIN algorithm, where a threshold processing avoids the false fundamental frequency detection on unvoiced sounds. The amplitude of the sinusoidal waves is calculated in the time domain from the weighted energy of 300-600Hz. In this case, since the first formant location corresponds to the first peak of the spectral envelope, we reconstruct the harmonic structure to avoid attenuating and overemphasizing by increasing the weight when the first formant location is lower, and vice versa. Consequently, the subjective and objective evaluations show that the proposed method reduces the speech quality difference between the original speech signal and the bandwidth extended speech signal.
ER -