Skip to main navigation Skip to search Skip to main content

Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus

S. Szaszkó, L. T. Kóczy, T. D. Gedeon

    Research output: Contribution to journalConference articlepeer-review

    Abstract

    Automatic thesaurus extraction is essential for modern information retrieval. We develop a method for fuzzy pseudo-thesaurus based on word pair co-occurrence in documents. In this study it is presented, that considering the Word Frequency Degree counted on the whole corpus makes the obtained pseudo-thesaurus usable. Such parameters were found with which most of the obtained pairs of words were validated to be related by human expert. Among the extracted pairs and groups of words the relationship is often looser than synonymy, but they identify the frequently repeated topics of the corpus. We suggest the use of groups of closely related words for the definition of different topics and based on this clustering of the documents were performed.1

    Original languageEnglish
    Pages (from-to)126-131
    Number of pages6
    JournalIEEE International Conference on Fuzzy Systems
    Publication statusPublished - 2005
    EventIEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2005 - Reno, NV, United States
    Duration: 22 May 200525 May 2005

    Fingerprint

    Dive into the research topics of 'Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus'. Together they form a unique fingerprint.

    Cite this