Corpus and dictionary files for 2023

Danielle Barth (Creator)

Research output: Non-textual formDigital work

Abstract

A compiled Matukar Panau corpus of 196,000+ words produced from newly and previously collected data, including words in context, speaker metadata, file metadata and where available parsing and glossing and translations. A subset of this corpus is included in a separate file as a morpheme corpus with parsing and glossing of 74,000+ morphemes. Most files have been standardized for spelling. The spelling standardization script package for ELAN was developed by Jake Farrell, AI Specialist at Appen, for the use by CoEDL researchers. A lexicon from ELAN in xml format is included. An annotation guideline for clause chains is included. Annotations are in tiers with the ELAN type "chain". These are used for an accepted paper: Barth, D. & Ross, M. (In Press). Clause chaining in Matukar Panau (Oceanic). In H. Sarvasy & A. Aikhenvald (Eds.), Clause chaining in the languages of the world. Oxford University Press. An annotation guideline for directional construction patterns is included, produced by Kira Davey as part of her ANU honours project "A quantitative study of directional constructions in Matukar Panau": https://openresearch-repository.anu.edu.au/handle/1885/274367Annotations are in tiers with the ELAN type "DIR".
Original languageEnglish
DOIs
Publication statusPublished - 2023

Fingerprint

Dive into the research topics of 'Corpus and dictionary files for 2023'. Together they form a unique fingerprint.

Cite this