Word classes in Indonesian: A linguistic reality or a convenient fallacy in natural language processing?

Meladel Mistica*, Timothy Baldwin, I. Wayan Arka

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    1 Citation (Scopus)

    Abstract

    This paper looks at Indonesian (Bahasa Indonesia), and the claim that there is no noun-verb distinction within the language as it is spoken in regions such as Riau and Jakarta. We test this claim for the language as it is written by a variety of Indonesian speakers using empirical methods traditionally used in part-of-speech induction. In this study we use only morphological patterns that we generate from a pre-existing morphological analyser. We find that once the distribution of the data points in our experiments match the distribution of the text from which we gather our data, we obtain significant results that show a distinction between the class of nouns and the class of verbs in Indonesian. Furthermore it shows promise that the labelling of word classes may be achieved only with morphological features, which could be applied to out-of-vocabulary items.

    Original languageEnglish
    Title of host publicationPACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
    Pages293-302
    Number of pages10
    Publication statusPublished - 2011
    Event25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25 - , Singapore
    Duration: 16 Dec 201118 Dec 2011

    Publication series

    NamePACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

    Conference

    Conference25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25
    Country/TerritorySingapore
    Period16/12/1118/12/11

    Fingerprint

    Dive into the research topics of 'Word classes in Indonesian: A linguistic reality or a convenient fallacy in natural language processing?'. Together they form a unique fingerprint.

    Cite this