The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Os incidentes cibernéticos epidêmicos são causados por sites maliciosos que usam kits de exploração. O kit de exploração facilita que os invasores executem o ataque drive-by download (DBD). No entanto, é relatado que sites maliciosos que usam um kit de exploração têm semelhanças nas árvores da estrutura do site (WS). Conseqüentemente, foram estudadas técnicas de identificação de sites maliciosos que utilizam árvores WS, onde as árvores WS podem ser estimadas a partir de dados de tráfego HTTP. No entanto, o componente defensivo do kit de exploração nos impede de capturar a árvore WS perfeitamente. Este artigo mostra, portanto, um novo procedimento de construção de árvore WS utilizando o fato de que um ataque DBD ocorre em uma determinada duração. Este artigo propõe, além disso, uma nova técnica de identificação de sites maliciosos, agrupando a árvore WS dos kits de exploração. Os resultados do experimento assumindo o conjunto de dados D3M verificam que a técnica proposta identifica kits de exploração com uma precisão razoável, mesmo quando o tráfego HTTP dos sites maliciosos é parcialmente perdido.
Tatsuya NAGAI
Kobe University
Masaki KAMIZONO
PwC Cyber Services
Yoshiaki SHIRAISHI
Kobe University
Kelin XIA
Nanyang Technological University
Masami MOHRI
Gifu University
Yasuhiro TAKANO
Kobe University
Masakatu MORII
Kobe University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Tatsuya NAGAI, Masaki KAMIZONO, Yoshiaki SHIRAISHI, Kelin XIA, Masami MOHRI, Yasuhiro TAKANO, Masakatu MORII, "A Malicious Web Site Identification Technique Using Web Structure Clustering" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 9, pp. 1665-1672, September 2019, doi: 10.1587/transinf.2018OFP0010.
Abstract: Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018OFP0010/_p
Copiar
@ARTICLE{e102-d_9_1665,
author={Tatsuya NAGAI, Masaki KAMIZONO, Yoshiaki SHIRAISHI, Kelin XIA, Masami MOHRI, Yasuhiro TAKANO, Masakatu MORII, },
journal={IEICE TRANSACTIONS on Information},
title={A Malicious Web Site Identification Technique Using Web Structure Clustering},
year={2019},
volume={E102-D},
number={9},
pages={1665-1672},
abstract={Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.},
keywords={},
doi={10.1587/transinf.2018OFP0010},
ISSN={1745-1361},
month={September},}
Copiar
TY - JOUR
TI - A Malicious Web Site Identification Technique Using Web Structure Clustering
T2 - IEICE TRANSACTIONS on Information
SP - 1665
EP - 1672
AU - Tatsuya NAGAI
AU - Masaki KAMIZONO
AU - Yoshiaki SHIRAISHI
AU - Kelin XIA
AU - Masami MOHRI
AU - Yasuhiro TAKANO
AU - Masakatu MORII
PY - 2019
DO - 10.1587/transinf.2018OFP0010
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2019
AB - Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.
ER -