TY - JOUR
T1 - Characterising reproducibility debt in scientific software
T2 - A systematic literature review
AU - Hassan, Zara
AU - Treude, Christoph
AU - Norrish, Michael
AU - Williams, Graham
AU - Potanin, Alex
N1 - Publisher Copyright:
© 2024
PY - 2025/4
Y1 - 2025/4
N2 - Context: In scientific software, the inability to reproduce results is often due to technical issues and challenges in recreating the full computational workflow from the original analysis. We conceptualise this problem as Reproducibility Debt (RpD). Much research has been performed to propose solutions to tackle these issues across various computational science disciplines. It is essential to identify and accumulate existing knowledge on reproducibility issues and state-of-the-art solutions so as to provide researchers and practitioners with information that enables further research activities and RpD management in practice. Objective: In the context of scientific software, we aim to characterise RpD by providing a taxonomy of issues contributing towards its emergence and identification (causes, effects) and the common solutions discussed in the existing literature. Method: We conducted a systematic literature review, considering 2198 studies until January 2024, including 214 primary studies. Results: We propose the first taxonomy of RpD items consisting of 37 causes attributed towards its emergence, 63 corresponding effects under seven main categories, and 29 prevention strategies. We also identify 39 specialised tools/frameworks supporting reproducibility. Conclusion: The main contributions of this work are (1) a formal definition of RpD; (2) a taxonomy of issues contributing towards RpD; (3) a list of causes and effects having implications for software professionals to identify and measure RpD in their projects; (4) a list of strategies and tools to prevent or remove RpD; (5) the identification of gaps in existing research to guide future studies.
AB - Context: In scientific software, the inability to reproduce results is often due to technical issues and challenges in recreating the full computational workflow from the original analysis. We conceptualise this problem as Reproducibility Debt (RpD). Much research has been performed to propose solutions to tackle these issues across various computational science disciplines. It is essential to identify and accumulate existing knowledge on reproducibility issues and state-of-the-art solutions so as to provide researchers and practitioners with information that enables further research activities and RpD management in practice. Objective: In the context of scientific software, we aim to characterise RpD by providing a taxonomy of issues contributing towards its emergence and identification (causes, effects) and the common solutions discussed in the existing literature. Method: We conducted a systematic literature review, considering 2198 studies until January 2024, including 214 primary studies. Results: We propose the first taxonomy of RpD items consisting of 37 causes attributed towards its emergence, 63 corresponding effects under seven main categories, and 29 prevention strategies. We also identify 39 specialised tools/frameworks supporting reproducibility. Conclusion: The main contributions of this work are (1) a formal definition of RpD; (2) a taxonomy of issues contributing towards RpD; (3) a list of causes and effects having implications for software professionals to identify and measure RpD in their projects; (4) a list of strategies and tools to prevent or remove RpD; (5) the identification of gaps in existing research to guide future studies.
KW - Computational reproducibility
KW - Reproducibility debt
KW - Scientific software
KW - Systematic literature review
KW - Technical debt
UR - http://www.scopus.com/inward/record.url?scp=85214288894&partnerID=8YFLogxK
U2 - 10.1016/j.jss.2024.112327
DO - 10.1016/j.jss.2024.112327
M3 - Article
AN - SCOPUS:85214288894
SN - 0164-1212
VL - 222
JO - Journal of Systems and Software
JF - Journal of Systems and Software
M1 - 112327
ER -