Hybrid random forests: Advantages of mixed trees in classifying text data

Baoxun Xu*, Joshua Zhexue Huang, Graham Williams, Mark Junjie Li, Yunming Ye

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Citations (Scopus)

Abstract

Random forests are a popular classification method based on an ensemble of a single type of decision tree. In the literature, there are many different types of decision tree algorithms, including C4.5, CART and CHAID. Each type of decision tree algorithms may capture different information and structures. In this paper, we propose a novel random forest algorithm, called a hybrid random forest. We ensemble multiple types of decision trees into a random forest, and exploit diversity of the trees to enhance the resulting model. We conducted a series of experiments on six text classification datasets to compare our method with traditional random forest methods and some other text categorization methods. The results show that our method consistently outperforms these compared methods.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings
Pages147-158
Number of pages12
EditionPART 1
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2012 - Kuala Lumpur, Malaysia
Duration: 29 May 20121 Jun 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume7301 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2012
Country/TerritoryMalaysia
CityKuala Lumpur
Period29/05/121/06/12

Fingerprint

Dive into the research topics of 'Hybrid random forests: Advantages of mixed trees in classifying text data'. Together they form a unique fingerprint.

Cite this