The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
O objetivo da Classificação de Cena Acústica (ASC) é simular a análise humana do ambiente circundante e tomar decisões precisas prontamente. Extrair informações úteis de sinais de áudio em cenários do mundo real é um desafio e pode levar a um desempenho abaixo do ideal na classificação de cenas acústicas, especialmente em ambientes com fundos relativamente homogêneos. Para resolver este problema, modelamos o processo de sobriedade dos “bêbados” na vida real e o comportamento orientador das pessoas normais, e construímos uma metodologia de implementação de modelo leve de alta precisão chamada “metodologia do bêbado”. A ideia central inclui três partes: (1) projetar um módulo especial de transformação de características baseado nos diferentes mecanismos de percepção de informações entre bêbados e pessoas comuns, para simular o processo de sobriedade gradual e as mudanças na capacidade de percepção de características; (2) estudar um modelo leve “bêbado” que corresponda ao processo de processamento de percepção do modelo normal. O modelo usa uma estrutura de bloco residual de classe multiescala e pode obter representações de recursos mais refinadas ao fundir informações extraídas em diferentes escalas; (3) introdução de um módulo de orientação e fusão do modelo convencional ao modelo “bêbado” para acelerar o processo de recuperação e alcançar otimização iterativa e melhoria de precisão. Os resultados da avaliação no conjunto de dados oficial do DCASE2022 Task1 demonstram que nosso sistema de linha de base atinge 40.4% de precisão e 2.284 perdas sob a condição de parâmetros de 442.67K e 19.40M MAC (operações de multiplicação e acumulação). Após adotar o mecanismo “bêbado”, a precisão é melhorada para 45.2% e a perda é reduzida em 0.634 sob a condição de parâmetros de 551.89K e 23.6M MAC.
Wenkai LIU
North China University of Technology
Lin ZHANG
North China University of Technology
Menglong WU
North China University of Technology
Xichang CAI
North China University of Technology
Hongxia DONG
North China University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Wenkai LIU, Lin ZHANG, Menglong WU, Xichang CAI, Hongxia DONG, "Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 1, pp. 83-92, January 2024, doi: 10.1587/transinf.2023EDP7107.
Abstract: The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7107/_p
Copiar
@ARTICLE{e107-d_1_83,
author={Wenkai LIU, Lin ZHANG, Menglong WU, Xichang CAI, Hongxia DONG, },
journal={IEICE TRANSACTIONS on Information},
title={Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology},
year={2024},
volume={E107-D},
number={1},
pages={83-92},
abstract={The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.},
keywords={},
doi={10.1587/transinf.2023EDP7107},
ISSN={1745-1361},
month={January},}
Copiar
TY - JOUR
TI - Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology
T2 - IEICE TRANSACTIONS on Information
SP - 83
EP - 92
AU - Wenkai LIU
AU - Lin ZHANG
AU - Menglong WU
AU - Xichang CAI
AU - Hongxia DONG
PY - 2024
DO - 10.1587/transinf.2023EDP7107
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2024
AB - The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
ER -