Madhu Nashipudimath, Monali Deshmukh
Abstract: Mining high utility item sets from a transactional database refers to the discovery of item sets with high utility like profits. Although a number of relevant algorithms have been proposed in recent years, they incur the problem of producing a large number of candidate item sets for high utility item sets. Such a large number of candidate item sets degrades the mining performance in terms of execution time and space requirement. The situation may become worse when the database contains lots of long transactions or long high utility item sets. In this paper, we propose two algorithms, namely UP-Growth (Utility Pattern Growth) and UP-Growth+, for mining high utility item sets with a set of effective strategies for pruning candidate item sets. The information of high utility item sets is maintained in a tree-based data structure named UP-Tree (Utility Pattern Tree) such that candidate item sets can be generated efficiently with only two scans of database. The performance of UP-Growth and UP Growth+ is compared with the state-of-the-art algorithms on many types of both real and synthetic datasets. Experimental results show that the proposed algorithms, especially UP-Growth+, not only reduce the number of candidates effectively but also outperform other algorithms substantially in terms of runtime, especially when databases contain lots of long transactions.
Keywords: Candidate pruning, frequent itemset, high utility itemset, utility mining, data mining