International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 118

M.Tech / M.E / PhD Thesis | Computer Science & Engineering | India | Volume 3 Issue 7, July 2014


Comparative Result of Building Decision Tree for Imbalanced Datasets

Kavita Rathod | Pramod Patil [5]


Abstract: To Studying imbalanced data is an important and it has common problem. To solve an imbalanced data problem, we have some different algorithms such as CART, C4.4, and HDDT. CART algorithm gives poor performance on imbalanced datasets as compare to other algorithms so that we omit here. We can use sampling techniques for Decision trees to solve imbalanced data problem, but in case of sampling techniques we need to measure parameter selection and to increase the complexity. To overcome this drawback, a new technique is proposed for Decision tree called as hellinger distance decision tree. It finds hellinger distance as the splitting criterion. In addition to this, skew insensitivity of hellinger distance and its advantage over popular changes such as entropy (gain ratio) is calculated. Moreover, the results are binary trees can be easily understand by normal user. To arrive at the particularly practical conclusion that for imbalanced data it is sufficient to use Hellinger trees with bagged HDDT (BG) instead of sampling methods. Learning with imbalanced dataset is an important and common problem. To solve an imbalanced data problem we can use sampling techniques for Decision trees, but in case sampling techniques we need to measure parameter selection and to increase the complexity. Hellinger Distance Decision Tree and C4.4 algorithm for Decision Tree, HDDT this method overcomes the drawback of existing system such as C4.4 method. In the prior system having the problem with imbalanced dataset, to solve this problem we can use sampling technique for decision tree and to build the Decision Tree here to use the splitting criterion as entropy (Gain ratio). In the propose system for Decision Tree which uses to find the Hellinger Distance as a splitting criterion Addition to this, skew insensitivity of hellinger distance. The results Decision Tree, finding gain ratio by using C4.4 and Hellinger distance by HDDT, and comparison between HDDT and C4.4 method using imbalanced datasets.


Keywords: Imbalanced datasets, C44 Algorithm, HDDT, Gain Ratio, Hellinger Distance and Decision Tree


Edition: Volume 3 Issue 7, July 2014,


Pages: 1679 - 1683


How to Download this Article?

You Need to Register Your Email Address Before You Can Download the Article PDF


How to Cite this Article?

Kavita Rathod, Pramod Patil, "Comparative Result of Building Decision Tree for Imbalanced Datasets", International Journal of Science and Research (IJSR), Volume 3 Issue 7, July 2014, pp. 1679-1683, https://www.ijsr.net/get_abstract.php?paper_id=201511

Similar Articles with Keyword 'Imbalanced'

Downloads: 105

Survey Paper, Computer Science & Engineering, India, Volume 4 Issue 12, December 2015

Pages: 1715 - 1717

Survey Paper on Fault Detection in Sensor Data

Vidya D. Omase | Jyoti N. Nandimath [2]

Share this Article

Downloads: 112

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 1021 - 1023

Privacy Preserving Mining Based Framework for Analyzing the Patient Behavior

Dhanashree M. Patil [2] | Dnyaneshwar A. Rokade

Share this Article
Top