The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
A pesquisa de código é uma tarefa para recuperar o código mais relevante dada uma consulta em linguagem natural. Vários estudos recentes propuseram métodos baseados em aprendizagem profunda que usam o modelo multi-codificador para analisar o código em vários campos para representar o código. Esses métodos melhoram o desempenho do modelo ao distinguir entre códigos semelhantes e utilizar uma matriz de relação para unir o código e a consulta. No entanto, esses modelos requerem mais recursos computacionais e parâmetros do que modelos de codificador único. Além disso, a utilização da matriz de relação que depende apenas do agrupamento máximo desconsidera a entrega de informações de alinhamento de palavras. Para aliviar esses problemas, propomos um modelo de alinhamento combinado para busca de código. Concatenamos os campos de vários códigos em uma sequência para representar o código e usamos um modelo de codificação para codificar os recursos do código. Além disso, transformamos a matriz de relação utilizando vetores treináveis para evitar perdas de informação. Em seguida, combinamos atenção intramodal e intermodal para atribuir as palavras mais importantes e, ao mesmo tempo, combinar o código e a consulta correspondentes. Finalmente, aplicamos o peso da atenção à incorporação de código/consulta e calculamos a similaridade de cosseno. Para avaliar o desempenho do nosso modelo, comparamos nosso modelo com seis modelos anteriores em dois conjuntos de dados populares. Os resultados mostram que nosso modelo atinge desempenho Top@0.614 de 0.687 e 1, superando os melhores modelos de comparação em 12.2% e 9.3%, respectivamente.
Juntong HONG
Kyoto Institute of Technology
Eunjong CHOI
Kyoto Institute of Technology
Osamu MIZUNO
Kyoto Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Juntong HONG, Eunjong CHOI, Osamu MIZUNO, "A Combined Alignment Model for Code Search" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 3, pp. 257-267, March 2024, doi: 10.1587/transinf.2023MPP0002.
Abstract: Code search is a task to retrieve the most relevant code given a natural language query. Several recent studies proposed deep learning based methods use multi-encoder model to parse code into multi-field to represent code. These methods enhance the performance of the model by distinguish between similar codes and utilizing a relation matrix to bridge the code and query. However, these models require more computational resources and parameters than single-encoder models. Furthermore, utilizing the relation matrix that solely relies on max-pooling disregards the delivery of word alignment information. To alleviate these problems, we propose a combined alignment model for code search. We concatenate the multi-code fields into one sequence to represent code and use one encoding model to encode code features. Moreover, we transform the relation matrix using trainable vectors to avoid information losses. Then, we combine intra-modal and cross-modal attention to assign the salient words while matching the corresponding code and query. Finally, we apply the attention weight to code/query embedding and compute the cosine similarity. To evaluate the performance of our model, we compare our model with six previous models on two popular datasets. The results show that our model achieves 0.614 and 0.687 Top@1 performance, outperforming the best comparison models by 12.2% and 9.3%, respectively.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023MPP0002/_p
Copiar
@ARTICLE{e107-d_3_257,
author={Juntong HONG, Eunjong CHOI, Osamu MIZUNO, },
journal={IEICE TRANSACTIONS on Information},
title={A Combined Alignment Model for Code Search},
year={2024},
volume={E107-D},
number={3},
pages={257-267},
abstract={Code search is a task to retrieve the most relevant code given a natural language query. Several recent studies proposed deep learning based methods use multi-encoder model to parse code into multi-field to represent code. These methods enhance the performance of the model by distinguish between similar codes and utilizing a relation matrix to bridge the code and query. However, these models require more computational resources and parameters than single-encoder models. Furthermore, utilizing the relation matrix that solely relies on max-pooling disregards the delivery of word alignment information. To alleviate these problems, we propose a combined alignment model for code search. We concatenate the multi-code fields into one sequence to represent code and use one encoding model to encode code features. Moreover, we transform the relation matrix using trainable vectors to avoid information losses. Then, we combine intra-modal and cross-modal attention to assign the salient words while matching the corresponding code and query. Finally, we apply the attention weight to code/query embedding and compute the cosine similarity. To evaluate the performance of our model, we compare our model with six previous models on two popular datasets. The results show that our model achieves 0.614 and 0.687 Top@1 performance, outperforming the best comparison models by 12.2% and 9.3%, respectively.},
keywords={},
doi={10.1587/transinf.2023MPP0002},
ISSN={1745-1361},
month={March},}
Copiar
TY - JOUR
TI - A Combined Alignment Model for Code Search
T2 - IEICE TRANSACTIONS on Information
SP - 257
EP - 267
AU - Juntong HONG
AU - Eunjong CHOI
AU - Osamu MIZUNO
PY - 2024
DO - 10.1587/transinf.2023MPP0002
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2024
AB - Code search is a task to retrieve the most relevant code given a natural language query. Several recent studies proposed deep learning based methods use multi-encoder model to parse code into multi-field to represent code. These methods enhance the performance of the model by distinguish between similar codes and utilizing a relation matrix to bridge the code and query. However, these models require more computational resources and parameters than single-encoder models. Furthermore, utilizing the relation matrix that solely relies on max-pooling disregards the delivery of word alignment information. To alleviate these problems, we propose a combined alignment model for code search. We concatenate the multi-code fields into one sequence to represent code and use one encoding model to encode code features. Moreover, we transform the relation matrix using trainable vectors to avoid information losses. Then, we combine intra-modal and cross-modal attention to assign the salient words while matching the corresponding code and query. Finally, we apply the attention weight to code/query embedding and compute the cosine similarity. To evaluate the performance of our model, we compare our model with six previous models on two popular datasets. The results show that our model achieves 0.614 and 0.687 Top@1 performance, outperforming the best comparison models by 12.2% and 9.3%, respectively.
ER -