A psychologist's thoughts on how and why we play games

Wednesday, May 4, 2016

Post-pub peer review should be transparent too

A few weeks ago, I did a little post-publication peer review. It was a novel experience for me, and lead me to consider the broader purpose of post-pub peer review.
In particular, I was reminded of the quarrel between Simone Schnall and Brent Donnellan (and others) back in 2014. Schnall et al. suggested an embodied cognition phenomenon wherein incidental cues of cleanliness influenced participants' ratings of moral disgust. Donnellan et al. ran replications and failed to detect the effect. An uproar ensued, goaded on by some vehement language by high-profile individuals on either side of the debate.

One thing about Schnall's experience stays with me today. In a blogpost, she summarizes her responses to a number of frequently asked questions. One answer is particularly important for anybody interested in post-publication peer review.
Question 10: “What has been your experience with replication attempts?”
My work has been targeted for multiple replication attempts; by now I have received so many such requests that I stopped counting. Further, data detectives have demanded the raw data of some of my studies, as they have done with other researchers in the area of embodied cognition because somehow this research area has been declared “suspect.” I stand by my methods and my findings and have nothing to hide and have always promptly complied with such requests. Unfortunately, there has been little reciprocation on the part of those who voiced the suspicions; replicators have not allowed me input on their data, nor have data detectives exonerated my analyses when they turned out to be accurate.
I invite the data detectives to publicly state that my findings lived up to their scrutiny, and more generally, share all their findings of secondary data analyses. Otherwise only errors get reported and highly publicized, when in fact the majority of research is solid and unproblematic.
[Note: Donnellan and colleagues were not among these data detectives. They did only the commendable job of performing replications and reporting the null results. I mention Donnellan et al. only to provide context -- it's my understanding that the failure to replicate lead to 3rd-party detectives's attempts to detect wrongdoing through analysis of the original Schnall et al. dataset. It is these attempts to detect wrongdoing that I refer to below.]

It is only fair that these data detectives report their analyses and how they failed to detect wrongdoing. I don't believe Schnall's phenomenon for a second, but the post-publication reviewers could at least report that they don't find evidence of fraud.

Data detectives themselves can run the risk of p-hacking and selective report. Imagine ten detectives run ten tests each. If all tests are independent, eventually one test will emerge with a very small p-value. If anyone is going to make accusations according to "trial by p-value," then we had damn well consider the problems of multiple comparisons and the garden of forking paths.

Post-publication peer review is often viewed as a threat, but it can and should be a boon, when appropriate. A post-pub review that finds no serious problems is encouraging, and should be reported and shared.* By contrast, if every data request is a prelude to accusations of error (or worse), then it becomes upsetting to learn that somebody is looking at your data. But data inspection should not imply that there are suspicions or serious concerns. Data requests and data sharing should be the norm -- they cannot be a once-in-a-career disaster.

Post-pub peer review is too important to be just a form of witch-hunting.
It's important, then, that post-publication peer reviewers give the full story. If thirty models give the same result, but one does not, you had better report all thirty-one models.** If somebody spends the time and energy to deal with your questions, post the answers so that the authors need not answer the questions all over again.

I do post-publication peer review because I generally don't trust the literature. I don't believe results until I can crack them open and run my fingers through the goop. I'm a tremendous pain in the ass. But I also want to be fair. My credibility, and the value of my peer reviews, depends on it.

The Court of Salem reels in terror at the perfect linearity of Jens Forster's sample means.

* Sakaluk, Williams, and Biernat (2014) suggest that, during pre-publication peer review, one reviewer run the code to make sure they get the same statistics. This would cut down on the number of misreported statistics. Until that process is a common part of pre-publication peer review, it will always be a beneficial result of post-publication peer review.

** Simonsohn, Simmons, and Nelson suggest specification curve, which takes the brute-force approach to this by reporting every possible p-value from every possible model. It's cool, but I've never tried to implement it yet.


  1. I am not sure that you needed to bring our group into this to raise an interesting point. Just for the record, I have no idea about the identities of the so-called data detectives or the other replicators.

    Below is a link to all of the correspondence regarding that replication. As you will see, Dr. Schnall specifically asked Drs. Lakens and Nosek to keep her concerns about the Johnson et al. data private (see 6 January 2014 email posted below). We did not learn about the concern about the ceiling effect until we saw the commentary sometime in March of 2013 if I can recall correctly. David Johnson asked twice for her insights (see email on 28 October 2013 and 31 October 2013). We did not hear anything until we saw the comment.


    From: Simone
    To: Brian and Daniel
    Hi Daniel and Brian,
    Just to add: Please do not share my analyses with the authors or anybody else until we have come to an agreement on how to proceed. Cheers,

    1. Hi Brent,

      It was not my intention to imply that you or your colleagues were the data detectives to which Dr. Schnall alluded. I am personally glad to see the Donnellan et al. replication and feel that it is an important and helpful article.

      My interest in mentioning Donnellan et al. was only to lend context to Dr. Schnall's blogpost. My understanding of her post is that she was set upon by third-party (i.e., not Donnellan et al.) data detectives interested in the possibility of finding some trace of wrongdoing in her original study. I think in this case it is appropriate that the detectives report any (lack of) findings.

      I have amended the blogpost to clarify this.

  2. Thanks Joe!
    That whole event was unpleasant and I felt like people were talking in broad generalities about who did what back in the summer of 2014. I just wish it would go away but I think that is naïve on my part. At the very least, I think being specific about who did and said what is important given the reputational stakes.

    I think things with PPPR would be so much easier if routine posting of data was the norm. "Data detectives" could therefore analyze anything they wanted to about a study and the raw data would be readily available for the future. For example, I thought it was good that different groups looked at the raw data from Johnson et al. (2014) and could independently evaluate the concerns about ceiling effects for themselves. No one in our group had any clue this was going on and we did not have to answer 3 or 4 different email requests to provide data.

    I am not sure that I totally agree that there is any obligation to report something like “We asked for data in a private email exchange and it checked out” given that the presumption about any published paper is that the underlying data are sound and that the analyses are reproducible. Presumably nobody knows when individual parties send emails requesting data so there is little reputational harm done to the person because the request was private. It is burdensome to answer emails but that can be solved by posting data in the first place.

    Likewise, if experimental materials are public and well-described, then independent parties can replicate the study without much assistance. So I basically think increased transparency about data and materials would solve many of the concerns you are raising. We might just disagree as to whether there is any obligation to signal that the analyses are reproducible and that the raw data seem legitimate. I think these things should be the default assumption.