TY - JOUR
T1 - A survey of the statistical power of research in behavioral ecology and animal behavior
AU - Jennions, Michael D.
AU - Møller, Anders Pape
PY - 2003/5
Y1 - 2003/5
N2 - We estimated the statistical power of the first and last statistical test presented in 697 papers from 10 behavioral journals. First tests had significantly greater statistical power and reported more significant results (smaller p values) than did last tests. This trend was consistent across journals, taxa, and the type of statistical test used. On average, statistical power was 13-16% to detect a small effect and 40-47% to detect a medium effect. This is far lower than the general recommendation of a power of 80%. By this criterion, only 2-3%, 13-21%, and 37-50% of the tests examined had the requisite power to detect a small, medium, or large effect, respectively. Neither p values nor statistical power varied significantly across the 10 journals or 11 taxa. However, mean p values of first and last tests were significantly correlated across journals (r = .67, n = 10, p = .034), with a similar trend for mean power (r = .63, n = 10, p = .051). There is therefore some evidence that power and p values are repeatable among journals. Mean p values or power of first and last tests were, however, uncorrelated across taxa. Finally, there was a significant correlation between power and reported p value for both first (r = .13, n = 684, p = .001) and last tests (r = .16, n = 654, p < .0001). If true effect sizes are unrelated to study sample sizes, the average true effect size must be nonzero for this pattern to emerge. This suggests that failure to observe significant relationships is partly owing to small sample sizes, as power increases with sample size.
AB - We estimated the statistical power of the first and last statistical test presented in 697 papers from 10 behavioral journals. First tests had significantly greater statistical power and reported more significant results (smaller p values) than did last tests. This trend was consistent across journals, taxa, and the type of statistical test used. On average, statistical power was 13-16% to detect a small effect and 40-47% to detect a medium effect. This is far lower than the general recommendation of a power of 80%. By this criterion, only 2-3%, 13-21%, and 37-50% of the tests examined had the requisite power to detect a small, medium, or large effect, respectively. Neither p values nor statistical power varied significantly across the 10 journals or 11 taxa. However, mean p values of first and last tests were significantly correlated across journals (r = .67, n = 10, p = .034), with a similar trend for mean power (r = .63, n = 10, p = .051). There is therefore some evidence that power and p values are repeatable among journals. Mean p values or power of first and last tests were, however, uncorrelated across taxa. Finally, there was a significant correlation between power and reported p value for both first (r = .13, n = 684, p = .001) and last tests (r = .16, n = 654, p < .0001). If true effect sizes are unrelated to study sample sizes, the average true effect size must be nonzero for this pattern to emerge. This suggests that failure to observe significant relationships is partly owing to small sample sizes, as power increases with sample size.
KW - Effect size
KW - Meta-analysis
KW - Publication bias
KW - Sample sizes
KW - Statistical power
UR - http://www.scopus.com/inward/record.url?scp=0038322948&partnerID=8YFLogxK
U2 - 10.1093/beheco/14.3.438
DO - 10.1093/beheco/14.3.438
M3 - Article
SN - 1045-2249
VL - 14
SP - 438
EP - 445
JO - Behavioral Ecology
JF - Behavioral Ecology
IS - 3
ER -