TY - GEN
T1 - Useful clustering outcomes from meaningful time series clustering
AU - Chen, Jason R.
PY - 2007
Y1 - 2007
N2 - Clustering time series data using the popular subsequence (STS) technique has been widely used in the data mining and wider communities. Recently the conclusion was made that it is meaningless, based on the findings that it produces (a) clustering outcomes for distinct time series that are not distinguishable from one another, and (b) cluster centroids that are smoothed. More recent work has since showed that (a) could be solved by introducing a lag in the subsequence vector construction process, however we show in this paper that such an approach does not solve (b). Motivating the terminology that a clustering method which overcomes (a) is meaningful, while one which overcomes (a) and (b) is useful, we propose an approach that produces useful time series clustering. The approach is based on restricting the clustering space to extend only over the region visited by the time series in the subsequence vector space. We test the approach on a set of 12 diverse real-world and synthetic data sets and find that (a) one can distinguish between the clusterings of these time series, and (b) that the centroids produced in each case retain the character of the underlying series from which they came.
AB - Clustering time series data using the popular subsequence (STS) technique has been widely used in the data mining and wider communities. Recently the conclusion was made that it is meaningless, based on the findings that it produces (a) clustering outcomes for distinct time series that are not distinguishable from one another, and (b) cluster centroids that are smoothed. More recent work has since showed that (a) could be solved by introducing a lag in the subsequence vector construction process, however we show in this paper that such an approach does not solve (b). Motivating the terminology that a clustering method which overcomes (a) is meaningful, while one which overcomes (a) and (b) is useful, we propose an approach that produces useful time series clustering. The approach is based on restricting the clustering space to extend only over the region visited by the time series in the subsequence vector space. We test the approach on a set of 12 diverse real-world and synthetic data sets and find that (a) one can distinguish between the clusterings of these time series, and (b) that the centroids produced in each case retain the character of the underlying series from which they came.
KW - Clustering
KW - Subsequence-time-series clustering
KW - Time series
UR - http://www.scopus.com/inward/record.url?scp=84870553659&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781920682514
T3 - Conferences in Research and Practice in Information Technology Series
SP - 101
EP - 109
BT - Data Mining and Analytics 2007 - 6th Australasian Data Mining Conference, AusDM 2007, Proceedings
T2 - 6th Australasian Data Mining Conference, AusDM 2007
Y2 - 3 December 2007 through 4 December 2007
ER -