Compositional data analysis tutorial.

Michael Smithson*, Stephen B. Broomell

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    10 Citations (Scopus)

    Abstract

    This article presents techniques for dealing with a form of dependency in data arising when numerical data sum to a constant for individual cases, that is, “compositional” or “ipsative” data. Examples are percentages that sum to 100, and hours in a day that sum to 24. Ipsative scales fell out of fashion in psychology during the 1960s and 1970s due to a lack of methods for analyzing them. However, ipsative scales have merits, and compositional data commonly occur in psychological research. Moreover, as we demonstrate, sometimes converting data to a compositional form yields insights not otherwise accessible. Fortunately, there are sound methods for analyzing compositional data. We seek to enable researchers to analyze compositional data by presenting appropriate techniques and illustrating their application to real data. First, we elaborate the technical details of compositional data and discuss both established and new approaches to their analysis. We then present applications of these methods to real social science datasets (data and code using R are available in a supplementary document). We conclude with a discussion of the state of the art in compositional data analysis and remaining unsolved problems. A brief guide to available software resources is provided in the first section of the supplementary document. Translational Abstract Psychological researchers sometimes must deal with numerical data that has a constant sum for each case in the sample. For instance, the amounts of time out of a 24-hr day that a person devotes to sleep, eating, work, recreation, and all other activities must sum to 24 hr. Likewise, the percentages of a person’s income allocated to food, rent, clothing, transportation, all other expenses, and savings must sum to 100%. These are known as “compositional data” in some disciplines, and traditionally as “ipsative data” in psychology. Researchers in psychology during the past several decades have had difficulties in analyzing compositional data because of the constant-sum requirement, and as a result, tended to avoid this kind of data. Fortunately, straightforward techniques for analyzing compositional data have been developed since the 1980s and software resources are available for them. We elaborate these techniques and demonstrate their application to real data. We also discuss the state of the art in compositional data analysis, including unsolved problems and new approaches. This article has two goals: enabling researchers to analyze compositional data, and persuading them that analyzing data from a compositional standpoint can be useful.

    Original languageEnglish
    JournalPsychological Methods
    DOIs
    Publication statusPublished - 2022

    Fingerprint

    Dive into the research topics of 'Compositional data analysis tutorial.'. Together they form a unique fingerprint.

    Cite this