The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Um dos problemas mais bem estudados na mineração de dados é computar a coleção de conjuntos de itens frequentes em grandes bancos de dados transacionais. Desde a introdução do famoso algoritmo Apriori [14], muitos outros foram propostos para encontrar os conjuntos de itens frequentes. Entre esses algoritmos, a abordagem de mineração conjuntos de itens fechados despertou muito interesse na comunidade de mineração de dados. Os algoritmos que adotam esta abordagem incluem TITANIC [8], CLOSET + [6], DCI-Closed [4], FCI-Stream [3], GC-Tree [5], TGC-Tree [16] etc. -Stream, GC-Tree e TGC-Tree são algoritmos online que funcionam em ambientes de janela deslizante. Pela avaliação de desempenho em [16], GC-Tree [15] é o mais rápido. Neste artigo, é proposto um algoritmo aprimorado baseado em GC-Tree, cuja complexidade computacional é comprovada como uma combinação linear do tamanho médio da transação e do tamanho médio do conjunto de itens fechados. O algoritmo é baseado no teorema essencial apresentado na Seção. 4.2. Empiricamente, o novo algoritmo é várias ordens de magnitude mais rápido que o algoritmo de última geração, GC-Tree.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Junbo CHEN, Bo ZHOU, Lu CHEN, Xinyu WANG, Yiqun DING, "Finding Frequent Closed Itemsets in Sliding Window in Linear Time" in IEICE TRANSACTIONS on Information,
vol. E91-D, no. 10, pp. 2406-2418, October 2008, doi: 10.1093/ietisy/e91-d.10.2406.
Abstract: One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+ [6], DCI-Closed [4], FCI-Stream [3], GC-Tree [5], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.10.2406/_p
Copiar
@ARTICLE{e91-d_10_2406,
author={Junbo CHEN, Bo ZHOU, Lu CHEN, Xinyu WANG, Yiqun DING, },
journal={IEICE TRANSACTIONS on Information},
title={Finding Frequent Closed Itemsets in Sliding Window in Linear Time},
year={2008},
volume={E91-D},
number={10},
pages={2406-2418},
abstract={One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+ [6], DCI-Closed [4], FCI-Stream [3], GC-Tree [5], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.},
keywords={},
doi={10.1093/ietisy/e91-d.10.2406},
ISSN={1745-1361},
month={October},}
Copiar
TY - JOUR
TI - Finding Frequent Closed Itemsets in Sliding Window in Linear Time
T2 - IEICE TRANSACTIONS on Information
SP - 2406
EP - 2418
AU - Junbo CHEN
AU - Bo ZHOU
AU - Lu CHEN
AU - Xinyu WANG
AU - Yiqun DING
PY - 2008
DO - 10.1093/ietisy/e91-d.10.2406
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2008
AB - One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+ [6], DCI-Closed [4], FCI-Stream [3], GC-Tree [5], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.
ER -