Sage Journals: Discover world-class research

Abstract

Mining frequent patterns in database has emerged as an important task in knowledge discovery and data mining. In this paper, we present an efficient algorithm called Mop for fast frequent pattern discovery. Mop utilizes a new kind of data structure called OP_tree (ordered pattern tree) and some particular properties of frequent patterns to facilitate the process of mining frequent patterns. An OP_tree is a special frequent pattern tree, where the children of any node are sorted according to the supports of corresponding items. Efficiency of Mop is achieved with three techniques: (1) it adopts OP_tree to store a large database to avoid repetitive database scans, (2) it finds all frequent 2-patterns in the construction of OP_tree to avoid the costly generation of a large number of candidate 2-patterns, (3) the supports of candidate k-patterns (k>2) can be obtained by traversing a few of specific subtrees of the OP_tree, which greatly reduces the search space and avoid multi-scans of a database. We experimentally compare our algorithm with the Apriori algorithm and the FP-growth algorithm on one real database and one synthetical database. The experimental results show that Mop is about an order of magnitude faster than the Apriori algorithm. Mop also outperforms the FP-growth algorithm, especially when support threshold is very low and databases are quite large.

Keywords

Data Mining Frequent Patterns Ordered Pattern Trees

Get full access to this article

View all access options for this article.