Header

A psychologist's thoughts on how and why we play games

Tuesday, March 22, 2016

Results-blinded Peer Review

The value of any experiment rests on the validity of its measurements and manipulations. If the manipulation doesn't have the intended effect, or the measurements are just noise, then the experiment's results will be uninformative.

This holds whether the results are statistically significant or not. A nonsignificant result, obviously, could be the consequence of an ineffective manipulation or a noisy outcome variable. But given a significant result, the results are still uninformative -- the significant result is either Type I error, or it reflects bias in the measurement.

The problem I have is that often the reader's (or at least, the reviewer's) perception of the method's validity may sometimes hinge upon the results obtained. Where a significant result might have been hailed as a successful conceptual replication, a nonsignificant result might be dismissed as a departure from appropriate methodology.

It makes me consider this puckish lesson from Archibald Cochrane, as quoted and summarized on Ben Goldacre's blog:
The results at that stage showed a slight numerical advantage for those who had been treated at home. I rather wickedly compiled two reports: one reversing the number of deaths on the two sides of the trial. As we were going into the committee, in the anteroom, I showed some cardiologists the results. They were vociferous in their abuse: “Archie,” they said “we always thought you were unethical. You must stop this trial at once.”
I let them have their say for some time, then apologized and gave them the true results, challenging them to say as vehemently, that coronary care units should be stopped immediately. There was dead silence and I felt rather sick because they were, after all, my medical colleagues.
Perhaps, just once in a while, such a results-blinded manuscript should be submitted to a journal. Once Reviewers 1, 2, and 3 have all had their say about the ingenuity of the method, the precision of the measurements, and the adequacy of the sample size, the true results could be revealed, and one could see how firmly the reviewers hold to their earlier arguments.

Thankfully, the increasing prevalence of Registered Reports may forestall the need for any such underhanded prank. Still, it is fun to think about.

1 comment:

  1. Your prank could actually make for a really interesting meta-scientific experiment. Make up a novel topic of study (so knowledge of prior lit isn't a confound), manipulate methods quality and results in a factorial design, and then assess reviewer decisions. Would be an interesting peak into the evaluative processes reviewers use.

    ReplyDelete