TY - JOUR
T1 - The problem of evaluating automated large-scale evidence aggregators
AU - Wüthrich, Nicolas
AU - Steele, Katie
N1 - Publisher Copyright:
© 2017, Springer Nature B.V.
PY - 2019/8/15
Y1 - 2019/8/15
N2 - In the biomedical context, policy makers face a large amount of potentially discordant evidence from different sources. This prompts the question of how this evidence should be aggregated in the interests of best-informed policy recommendations. The starting point of our discussion is Hunter and Williams’ recent work on an automated aggregation method for medical evidence. Our negative claim is that it is far from clear what the relevant criteria for evaluating an evidence aggregator of this sort are. What is the appropriate balance between explicitly coded algorithms and implicit reasoning involved, for instance, in the packaging of input evidence? In short: What is the optimal degree of ‘automation’? On the positive side: We propose the ability to perform an adequate robustness analysis (which depends on the nature of the input variables and parameters of the aggregator) as the focal criterion, primarily because it directs efforts to what is most important, namely, the structure of the algorithm and the appropriate extent of automation. Moreover, where there are resource constraints on the aggregation process, one must also consider what balance between volume of evidence and accuracy in the treatment of individual evidence best facilitates inference. There is no prerogative to aggregate the total evidence available if this would in fact reduce overall accuracy.
AB - In the biomedical context, policy makers face a large amount of potentially discordant evidence from different sources. This prompts the question of how this evidence should be aggregated in the interests of best-informed policy recommendations. The starting point of our discussion is Hunter and Williams’ recent work on an automated aggregation method for medical evidence. Our negative claim is that it is far from clear what the relevant criteria for evaluating an evidence aggregator of this sort are. What is the appropriate balance between explicitly coded algorithms and implicit reasoning involved, for instance, in the packaging of input evidence? In short: What is the optimal degree of ‘automation’? On the positive side: We propose the ability to perform an adequate robustness analysis (which depends on the nature of the input variables and parameters of the aggregator) as the focal criterion, primarily because it directs efforts to what is most important, namely, the structure of the algorithm and the appropriate extent of automation. Moreover, where there are resource constraints on the aggregation process, one must also consider what balance between volume of evidence and accuracy in the treatment of individual evidence best facilitates inference. There is no prerogative to aggregate the total evidence available if this would in fact reduce overall accuracy.
KW - Evidence aggregation
KW - Evidence-based medicine
KW - Robustness analysis
KW - Statistical meta-analysis
UR - http://www.scopus.com/inward/record.url?scp=85035101099&partnerID=8YFLogxK
U2 - 10.1007/s11229-017-1627-1
DO - 10.1007/s11229-017-1627-1
M3 - Article
SN - 0039-7857
VL - 196
SP - 3083
EP - 3102
JO - Synthese
JF - Synthese
IS - 8
ER -