I’ve been doing research into open science and the replication crisis in psychology recently for work (to make a case for open science practices in the education technology industry), and stumbled across this article on the replication crisis in psychology by Ulrich Schimmack. One particular part of the article caught my attention (highlighted in the image below).
The idea here, as I interpret it, is that if psychology studies are conducted at 80% power on average, then the replication rate in psychology should be around 80% — assuming everything is done correctly. The replication rate in psychology, however, is very clearly not at 80%. A quick summary of the major replication projects in psychology, such as the Open Science Collaboration and the Many Labs efforts, point to a replication rate closer to around 46%.
So, assuming standard study power of 80% we would then expect that 20% of studies wouldn’t replicate even if everything was done correctly in the replication studies. But over 50% of psychology studies fail to replicate and psychology studies aren’t powered at 80% — not even close. This comment in Ulrich’s article reminded me of a 2018 study published in Psychological Bulletin on the average power in psychology studies across sub-disciplines. The result? The average power in psychology studies is a whopping 36% (just shy of 80%, yea?), with social psychology – the main target of replication efforts – at just over 30%. Only around 20% of social psychology studies are adequately powered, according to this study.
What does this mean, then? Well, if the average power of psychology studies is only 36% then we would expect that on average about 64% of psychology studies wouldn’t replicate which is pretty close to the estimated 54% failed replications across the major replication effort projects. So, in a way the “shockingly” low replication rates are completely expected given the dismal power of psychology studies, on average.
But there is reason to suspect that the replication rate of around 46% is actually pretty darn high, all things considered. I was curious, so I conducted rigorous Twitter polls to see what the masses thought. Below, are the two polls. The first, was trying to get at what my biased twitter audience thought the replication rate would be assuming that all psychology studies have on average 80% power. The majority thought replication would be closest to around 50% — close the current rate. The second poll simply asked if the statement by Ulrich (highlighted in the opening paragraph of this post) was true or false. A close poll indicated that it was ‘true’, but many thought it was false.
Finally, once all the other methodological issues are considered (e.g., variation in the replication study, poor measurement, inaccurate effect sizes, etc), the expected replication rate – even with 80% average power, would likely be lower than 80%. All of this therefore leads me to the conclusion that given the dismal average power of psychology studies, in addition to other methodological problems (e.g., measurement problems), and research process biases (e.g., p-hacking, HARKing), the psychology replication rate of around 46% is way higher than what would be expected. Does this mean that it is good? Absolutely not.
One Reply to “Are Replication Rates in Psychology Actually Better Than Expected? (Despite Them Still Being Terrible, Of Course)”
I must be missing something. Surely the “power” of a test reflects the ability to detect an effect if it is truly present. I.e., the probability of falsely rejecting the null hypothesis when the alternative hypothesis is true (Type II, or beta error). Replication studies don’t attempt to confirm negative results: they seek to confirm a positive result. Surely in that case it is the alpha value that is important, usually 95%. So we would expect only 5% of studies that show an effect to fail to replicate because they were in fact false positives (Type 1 error). I must be thinking about this incorrectly, but how?