TY - GEN
T1 - An effective pattern based outlier detection approach for mixed attribute data
AU - Zhang, Ke
AU - Jin, Huidong
PY - 2010
Y1 - 2010
N2 - Detecting outliers in mixed attribute datasets is one of major challenges in real world applications. Existing outlier detection methods lack effectiveness for mixed attribute datasets mainly due to their inability of considering interactions among different types of, e.g., numerical and categorical attributes. To address this issue in mixed attribute datasets, we propose a novel Pattern based Outlier Detection approach (POD). Pattern in this paper is defined to describe majority of data as well as capture interactions among different types of attributes. In POD, the more does an object deviate from these patterns, the higher is its outlier factor. We use logistic regression to learn patterns and then formulate the outlier factor in mixed attribute datasets. A series of experimental results illustrate that POD performs statistically significantly better than several classic outlier detection methods.
AB - Detecting outliers in mixed attribute datasets is one of major challenges in real world applications. Existing outlier detection methods lack effectiveness for mixed attribute datasets mainly due to their inability of considering interactions among different types of, e.g., numerical and categorical attributes. To address this issue in mixed attribute datasets, we propose a novel Pattern based Outlier Detection approach (POD). Pattern in this paper is defined to describe majority of data as well as capture interactions among different types of attributes. In POD, the more does an object deviate from these patterns, the higher is its outlier factor. We use logistic regression to learn patterns and then formulate the outlier factor in mixed attribute datasets. A series of experimental results illustrate that POD performs statistically significantly better than several classic outlier detection methods.
KW - mixed attribute data
KW - outlier detection
KW - pattern based outlier detection
UR - http://www.scopus.com/inward/record.url?scp=78650791740&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-17432-2_13
DO - 10.1007/978-3-642-17432-2_13
M3 - Conference contribution
SN - 3642174310
SN - 9783642174315
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 122
EP - 131
BT - AI 2010
T2 - 23rd Australasian Joint Conference on Artificial Intelligence, AI 2010
Y2 - 7 December 2010 through 10 December 2010
ER -