Abstract
Lowand highvarieties of Indonesian and other languages of Indonesia are poorly resourced for developing human language technologies. Many languages spoken in Indonesia, even those with very large speaker populations, such as Javanese (over 80 million), are thought to be threatened languages. The teaching of Indonesian language focuses on the prestige variety which forms part of the unusual diglossia found in many parts of Indonesia. We developed a publicly available pipeline to scrape and clean text from the PDFs of a classic Indonesian textbook, The Indonesian Way, creating a corpus. Using the corpus and curated wordlists from a number of lexicons I searched for instances of non-prestige varieties of Indonesian, finding that they play a limited, secondary role to formal Indonesian in this textbook. References to other languages used in Indonesia are usually made as a passing comment. These methods help to determine how text teaching resources relate to and influence the language politics of diglossia and the many languages of Indonesia.
Original language | English |
---|---|
Title of host publication | Proceedings of the 4th Workshop on Computational Methods for Endangered Languages |
Editors | Miikka Silfverberg, University of British Columbia |
Place of Publication | Boulder, CO |
Publisher | University of Colorado |
Pages | 24-32 |
Edition | Volume 1 |
Publication status | Published - 2021 |
Event | Workshop on Computational Methods for Endangered Languages - Online Duration: 1 Jan 2021 → … https://journals.colorado.edu/index.php/computel/index |
Conference
Conference | Workshop on Computational Methods for Endangered Languages |
---|---|
Period | 1/01/21 → … |
Other | Tue Mar 02 00:00:00 AEST 2021 |
Internet address |