Towards a parallel data mining toolbox

P. Christen*, M. Hegland, O. M. Nielsen, S. Roberts, P. Strazdins, T. Semenova, Irfan Altas, Timothy Hancock

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    2 Citations (Scopus)

    Abstract

    This paper presents research projects tackling two aspects in data mining. First, a toolbox is discussed that allows flexible and interactive data exploration, analysis and presentation using the scripting language Python. The advantages of this toolbox are that it provides the functionality to process multiple SQL queries in parallel, and enables fast data retrieval using a supervised caching mechanism for commonly used queries. These two facets of the toolbox allow for fast, efficient data access reducing the time spent on data exploration, preparation and analysis. Secondly, an approach to predictive modelling is presented that leads to scalable parallel algorithms for high dimensional data collections. This is an essential requirement for data mining algorithms as those that do not scale linearly with the data size are infeasible. These algorithms are implemented in parallel and achieve an almost ideal speedup for their respective implementations. One aim of the presented research is to integrate and combine these two different aspects of data mining into an efficient but flexible data mining toolbox that allows the experienced data miner to attack large scale problems interactively or with batch processing.

    Original languageEnglish
    Title of host publicationProceedings - 15th International Parallel and Distributed Processing Symposium, IPDPS 2001
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages1563-1570
    Number of pages8
    ISBN (Electronic)0769509908, 9780769509907
    DOIs
    Publication statusPublished - 2001
    Event15th International Parallel and Distributed Processing Symposium, IPDPS 2001 - San Francisco, United States
    Duration: 23 Apr 200127 Apr 2001

    Publication series

    NameProceedings - 15th International Parallel and Distributed Processing Symposium, IPDPS 2001

    Conference

    Conference15th International Parallel and Distributed Processing Symposium, IPDPS 2001
    Country/TerritoryUnited States
    CitySan Francisco
    Period23/04/0127/04/01

    Fingerprint

    Dive into the research topics of 'Towards a parallel data mining toolbox'. Together they form a unique fingerprint.

    Cite this