Abstract
The Yam languages of the southern New Guinea region are one of the world’s primary lan-
guage families, i.e. not related to any other family in the world (Evans et al., 2017). They are the
fourth-largest family in New Guinea by number of languages, after the Trans-New Guinea, Torricelli,
and Sepik-Ramu families, but due to recent and intense documentation efforts, are the second-most
documented. On the back of this field-work effort, we propose a fine-grained quantitatively-grounded phylogeny of these languages representing a history of the family.
The major examples of documentation come in the form of three descriptive grammars (Car-
roll, 2016; Döhler, 2018; Siegel, 2023) as well as a sketch grammar and language documentation
project on the previously little-known Yei branch (Carroll, in press), as well as PhDs describing as-
pects of Yam languages such as the grammar of Ranmo (Lee, 2016) and another on variation in Nmbo (Kashima, 2020). This is in addition to numerous papers on languages of the family, too numerous to mention here.
The extended and rigorous documentation effort has created the space for developing new
knowledge on the history of the family, of which we currently know very little. Preliminary results
on historical reconstructions were published by Evans, et al., (2017), with updated results currently in
press (Evans et al., in press). This paper develops these works and previous conference presentations
(Evans et al., 2017b) to produce a high-definition phylogeny of the language family.
The phylogeny is built using lexical data drawn from an expanded version of the Yamfinder
lexical database (Carroll et al., online; yamfinder.com). This includes a list of 388 core vocabulary
items chosen for relevance in the region taken from 26 Yam languages we have data for. This list
has then been annotated for lexical cognates based on what we know regarding the historical reconstructions (Evans et al., 2017, 2017b, in press). This cognate data then serves as evidence for shared historical relatedness. We have provided a preliminary phylogeny in Figure 1 using these cognates.
In this tree, the cognates were used to calculate symmetric generalized Robinson-Foulds Distances
between languages using LingPy (List & Forkel, 2021). The tree was generated from these distances
using a basic UPGMA (unweighted pair group method with arithmetic mean) clustering algorithm.
We extend from the automated approach presented in Figure 1 by building a Bayesian phy-
logeny from the cognate data and providing a high resolution branching-tree model of the language
family history (Bouckaert et al. 2019). We then use this model to calculate concordance factors for
each cognate set (Minh et al. 2020). Concordance factors measure the extent a particular cognate set agrees with the branching model, and identifies cognates that present alternative branching histories within this complex linguistic region.
The results are a significant step forward in models of history in Papuan languages. This represents arguably the most detailed and accurate phylogeny of any Papuan family to date, with potential exception of Trans-New Guinea (Greenhill, in press). This phylogeny also represents the next step in the long-term goal of unlocking potential deeper time-depth relationships in the region beyond what is possible given current comparative methods.
guage families, i.e. not related to any other family in the world (Evans et al., 2017). They are the
fourth-largest family in New Guinea by number of languages, after the Trans-New Guinea, Torricelli,
and Sepik-Ramu families, but due to recent and intense documentation efforts, are the second-most
documented. On the back of this field-work effort, we propose a fine-grained quantitatively-grounded phylogeny of these languages representing a history of the family.
The major examples of documentation come in the form of three descriptive grammars (Car-
roll, 2016; Döhler, 2018; Siegel, 2023) as well as a sketch grammar and language documentation
project on the previously little-known Yei branch (Carroll, in press), as well as PhDs describing as-
pects of Yam languages such as the grammar of Ranmo (Lee, 2016) and another on variation in Nmbo (Kashima, 2020). This is in addition to numerous papers on languages of the family, too numerous to mention here.
The extended and rigorous documentation effort has created the space for developing new
knowledge on the history of the family, of which we currently know very little. Preliminary results
on historical reconstructions were published by Evans, et al., (2017), with updated results currently in
press (Evans et al., in press). This paper develops these works and previous conference presentations
(Evans et al., 2017b) to produce a high-definition phylogeny of the language family.
The phylogeny is built using lexical data drawn from an expanded version of the Yamfinder
lexical database (Carroll et al., online; yamfinder.com). This includes a list of 388 core vocabulary
items chosen for relevance in the region taken from 26 Yam languages we have data for. This list
has then been annotated for lexical cognates based on what we know regarding the historical reconstructions (Evans et al., 2017, 2017b, in press). This cognate data then serves as evidence for shared historical relatedness. We have provided a preliminary phylogeny in Figure 1 using these cognates.
In this tree, the cognates were used to calculate symmetric generalized Robinson-Foulds Distances
between languages using LingPy (List & Forkel, 2021). The tree was generated from these distances
using a basic UPGMA (unweighted pair group method with arithmetic mean) clustering algorithm.
We extend from the automated approach presented in Figure 1 by building a Bayesian phy-
logeny from the cognate data and providing a high resolution branching-tree model of the language
family history (Bouckaert et al. 2019). We then use this model to calculate concordance factors for
each cognate set (Minh et al. 2020). Concordance factors measure the extent a particular cognate set agrees with the branching model, and identifies cognates that present alternative branching histories within this complex linguistic region.
The results are a significant step forward in models of history in Papuan languages. This represents arguably the most detailed and accurate phylogeny of any Papuan family to date, with potential exception of Trans-New Guinea (Greenhill, in press). This phylogeny also represents the next step in the long-term goal of unlocking potential deeper time-depth relationships in the region beyond what is possible given current comparative methods.
Original language | English |
---|---|
Pages | 165 |
Number of pages | 166 |
Publication status | Published - 26 Nov 2024 |
Event | Australian Linguistic Society Annual Conference - University of Melbourne, Australia Duration: 1 Jan 2013 → … |
Conference
Conference | Australian Linguistic Society Annual Conference |
---|---|
Country/Territory | Australia |
Period | 1/01/13 → … |
Other | 1-4 October 2013 |