TY - JOUR
T1 - Compositional data analysis tutorial.
AU - Smithson, Michael
AU - Broomell, Stephen B.
N1 - Publisher Copyright:
© 2022 American Psychological Association
PY - 2022
Y1 - 2022
N2 - This article presents techniques for dealing with a form of dependency in data arising when numerical data sum to a constant for individual cases, that is, “compositional” or “ipsative” data. Examples are percentages that sum to 100, and hours in a day that sum to 24. Ipsative scales fell out of fashion in psychology during the 1960s and 1970s due to a lack of methods for analyzing them. However, ipsative scales have merits, and compositional data commonly occur in psychological research. Moreover, as we demonstrate, sometimes converting data to a compositional form yields insights not otherwise accessible. Fortunately, there are sound methods for analyzing compositional data. We seek to enable researchers to analyze compositional data by presenting appropriate techniques and illustrating their application to real data. First, we elaborate the technical details of compositional data and discuss both established and new approaches to their analysis. We then present applications of these methods to real social science datasets (data and code using R are available in a supplementary document). We conclude with a discussion of the state of the art in compositional data analysis and remaining unsolved problems. A brief guide to available software resources is provided in the first section of the supplementary document. Translational Abstract Psychological researchers sometimes must deal with numerical data that has a constant sum for each case in the sample. For instance, the amounts of time out of a 24-hr day that a person devotes to sleep, eating, work, recreation, and all other activities must sum to 24 hr. Likewise, the percentages of a person’s income allocated to food, rent, clothing, transportation, all other expenses, and savings must sum to 100%. These are known as “compositional data” in some disciplines, and traditionally as “ipsative data” in psychology. Researchers in psychology during the past several decades have had difficulties in analyzing compositional data because of the constant-sum requirement, and as a result, tended to avoid this kind of data. Fortunately, straightforward techniques for analyzing compositional data have been developed since the 1980s and software resources are available for them. We elaborate these techniques and demonstrate their application to real data. We also discuss the state of the art in compositional data analysis, including unsolved problems and new approaches. This article has two goals: enabling researchers to analyze compositional data, and persuading them that analyzing data from a compositional standpoint can be useful.
AB - This article presents techniques for dealing with a form of dependency in data arising when numerical data sum to a constant for individual cases, that is, “compositional” or “ipsative” data. Examples are percentages that sum to 100, and hours in a day that sum to 24. Ipsative scales fell out of fashion in psychology during the 1960s and 1970s due to a lack of methods for analyzing them. However, ipsative scales have merits, and compositional data commonly occur in psychological research. Moreover, as we demonstrate, sometimes converting data to a compositional form yields insights not otherwise accessible. Fortunately, there are sound methods for analyzing compositional data. We seek to enable researchers to analyze compositional data by presenting appropriate techniques and illustrating their application to real data. First, we elaborate the technical details of compositional data and discuss both established and new approaches to their analysis. We then present applications of these methods to real social science datasets (data and code using R are available in a supplementary document). We conclude with a discussion of the state of the art in compositional data analysis and remaining unsolved problems. A brief guide to available software resources is provided in the first section of the supplementary document. Translational Abstract Psychological researchers sometimes must deal with numerical data that has a constant sum for each case in the sample. For instance, the amounts of time out of a 24-hr day that a person devotes to sleep, eating, work, recreation, and all other activities must sum to 24 hr. Likewise, the percentages of a person’s income allocated to food, rent, clothing, transportation, all other expenses, and savings must sum to 100%. These are known as “compositional data” in some disciplines, and traditionally as “ipsative data” in psychology. Researchers in psychology during the past several decades have had difficulties in analyzing compositional data because of the constant-sum requirement, and as a result, tended to avoid this kind of data. Fortunately, straightforward techniques for analyzing compositional data have been developed since the 1980s and software resources are available for them. We elaborate these techniques and demonstrate their application to real data. We also discuss the state of the art in compositional data analysis, including unsolved problems and new approaches. This article has two goals: enabling researchers to analyze compositional data, and persuading them that analyzing data from a compositional standpoint can be useful.
KW - Beta regression
KW - Compositional data
KW - Copula
KW - Ipsative data
KW - Log-ratio
UR - http://www.scopus.com/inward/record.url?scp=85125059880&partnerID=8YFLogxK
U2 - 10.1037/met0000464
DO - 10.1037/met0000464
M3 - Article
SN - 1082-989X
JO - Psychological Methods
JF - Psychological Methods
ER -