Abstract
Parametric, model-based algorithms learn generative models from the data, with each model corresponding to one particular cluster. Accordingly, the model-based partitional algorithm will select the most suitable model for any data object (Clustering step), and will recompute parametric models using data specifically from the corresponding clusters (Maximization step). This Clustering-Maximization framework have been widely used and have shown promising results in many applications including complex variable-length data. The paper proposes Experience-Innovation (EI) method as a natural extension of the Clustering-Maximization framework. This method includes 3 components: 1) keep the best past experience and make empirical likelihood trajectory monotonical as a result; 2) find a new model as a function of existing models so that the corresponding cluster will split existing clusters with bigger number of elements and smaller uniformity; 3) heuristical innovations, for example, several trials with random initial settings. Also, we introduce clustering regularisation based on the balanced complex of two conditions: 1) significance of any particular cluster; 2) difference between any 2 clusters. We illustrate effectiveness of the proposed methods using first-order Markov model in application to the large webtraffic dataset. The aim of the experiment is to explain and understand the way people interact with web sites.
Original language | English |
---|---|
Article number | 22 |
Pages (from-to) | 190-201 |
Number of pages | 12 |
Journal | Proceedings of SPIE - The International Society for Optical Engineering |
Volume | 5812 |
DOIs | |
Publication status | Published - 2005 |
Event | Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2005 - Orlando, FL, United States Duration: 28 Mar 2005 → 29 Mar 2005 |