A few weeks ago, I did a little post-publication peer review. It was a novel experience for me, and lead me to consider the broader purpose of post-pub peer review.In particular, I was reminded of the quarrel between Simone Schnall and Brent Donnellan (and others) back in 2014. Schnall et al. suggested an embodied cognition phenomenon wherein incidental cues of cleanliness influenced participants' ratings of moral disgust. Donnellan et al. ran replications and failed to detect the effect. An uproar ensued, goaded on by some vehement language by high-profile individuals on either side of the debate.
One thing about Schnall's experience stays with me today. In a blogpost, she summarizes her responses to a number of frequently asked questions. One answer is particularly important for anybody interested in post-publication peer review.
Question 10: “What has been your experience with replication attempts?”
My work has been targeted for multiple replication attempts; by now I have received so many such requests that I stopped counting. Further, data detectives have demanded the raw data of some of my studies, as they have done with other researchers in the area of embodied cognition because somehow this research area has been declared “suspect.” I stand by my methods and my findings and have nothing to hide and have always promptly complied with such requests. Unfortunately, there has been little reciprocation on the part of those who voiced the suspicions; replicators have not allowed me input on their data, nor have data detectives exonerated my analyses when they turned out to be accurate.
I invite the data detectives to publicly state that my findings lived up to their scrutiny, and more generally, share all their findings of secondary data analyses. Otherwise only errors get reported and highly publicized, when in fact the majority of research is solid and unproblematic.[Note: Donnellan and colleagues were not among these data detectives. They did only the commendable job of performing replications and reporting the null results. I mention Donnellan et al. only to provide context -- it's my understanding that the failure to replicate lead to 3rd-party detectives's attempts to detect wrongdoing through analysis of the original Schnall et al. dataset. It is these attempts to detect wrongdoing that I refer to below.]
It is only fair that these data detectives report their analyses and how they failed to detect wrongdoing. I don't believe Schnall's phenomenon for a second, but the post-publication reviewers could at least report that they don't find evidence of fraud.
Data detectives themselves can run the risk of p-hacking and selective report. Imagine ten detectives run ten tests each. If all tests are independent, eventually one test will emerge with a very small p-value. If anyone is going to make accusations according to "trial by p-value," then we had damn well consider the problems of multiple comparisons and the garden of forking paths.
Post-publication peer review is often viewed as a threat, but it can and should be a boon, when appropriate. A post-pub review that finds no serious problems is encouraging, and should be reported and shared.* By contrast, if every data request is a prelude to accusations of error (or worse), then it becomes upsetting to learn that somebody is looking at your data. But data inspection should not imply that there are suspicions or serious concerns. Data requests and data sharing should be the norm -- they cannot be a once-in-a-career disaster.
|Post-pub peer review is too important to be just a form of witch-hunting.|
I do post-publication peer review because I generally don't trust the literature. I don't believe results until I can crack them open and run my fingers through the goop. I'm a tremendous pain in the ass. But I also want to be fair. My credibility, and the value of my peer reviews, depends on it.
|The Court of Salem reels in terror at the perfect linearity of Jens Forster's sample means.|
* Sakaluk, Williams, and Biernat (2014) suggest that, during pre-publication peer review, one reviewer run the code to make sure they get the same statistics. This would cut down on the number of misreported statistics. Until that process is a common part of pre-publication peer review, it will always be a beneficial result of post-publication peer review.
** Simonsohn, Simmons, and Nelson suggest specification curve, which takes the brute-force approach to this by reporting every possible p-value from every possible model. It's cool, but I've never tried to implement it yet.