Crystal Prison Zone: February 2017

Monday, February 27, 2017

Publication bias can hide your moderators

It is a common goal of meta-analysis to provide not only an overall average effect size, but also to test for moderators that cause the effect size to become larger or smaller. For example, researchers who study the effects of violent media would like to know who is most at risk for adverse effects. Researchers who study psychotherapy would like to recommend a particular therapy as being most helpful.

However, meta-analysis does not often generate these insights. For example, research has not found that violent-media effects are larger for children than for adults (Anderson et al. 2010). Similarly, it is often reported that all therapies are roughly equally effective (the "dodo bird verdict," Luborsky, Singer, & Luborsky, 1975; Wampold et al., 1997).

"Everybody has won, and all must have prizes. At least, that's what it looks like if you only look at what got published."

It seems to me that publication bias may obscure such patterns of moderation. Publication bias introduces a “small-study effect” in which the observed effect size is highly dependent on the sample size. Large-sample studies can reach statistical significance with smaller effect sizes. Small-sample studies can only reach statistical significance by reporting enormous effect sizes. The observed effect sizes gathered in meta-analysis, therefore, may be more a function of the sample size than they are a function of theoretically-important moderators such as age group or treatment type.

In this simulation, I compare the statistical power of meta-analysis to detect moderators when there is, or when there is not, publication bias.

Method

Simulations cover 4 scenarios in a 2 (Effects: large or medium) × 2 (Pub bias: absent or present) design.

When effect sizes were large, the true effects were δ = 0 in the first population, δ = 0.3 in the second population, and δ = 0.6 in the third population. When effect sizes were medium, the true effects were δ = 0 in the first population, δ = 0.2 in the second population, and δ = 0.4 in the third population. Thus, each scenario represents one group with no effect, a group with a medium-small effect, and a group with an effect twice as large.

When studies were simulated without publication bias, twenty studies were conducted on each population, and all were reported. When studies were simulated with publication bias, studies were simulated, then published and/or file-drawered until at least 70% of the published effects were statistically significant. When results were not statistically significant and were file-drawered, further studies were simulated until 20 statistically significant results were obtained. This keeps the number of studies k constant at 20, which prevents confounding the influence of publication bias with the influence of fewer observed studies.

For each condition, I report the observed effect size for each group, the statistical power of the test for moderators, and the statistical power of the Egger test for publication bias. I simulated 500 meta-analyses within each condition in order to obtain stable estimates.

Results

Large effects.

Without publication bias:

In 100% of the metas, the difference between δ = 0 and δ = 0.6 was detected.
In 92% of the metas, the difference between δ = 0 and δ = 0.3 was detected.
In only 4.2% of cases was the δ = 0 group mistaken as having a significant effect.
Effect sizes within each group were accurately estimated (in the long run) as δ = 0, 0.3, and 0.6.

With publication bias:

Only 15% of the metas were able to tell the difference between δ = 0 and δ = 0.3.
91% of meta-analyses were able to tell the difference between δ = 0 and δ = 0.6.
100% of the metas mistook the δ = 0 group as having a significant effect.
Effect sizes within each group were overestimated: d = .45, .58, and .73 instead of 0, 0.3, and 0.6.

Here's a plot of the moderator parameters across the 500 simulations without bias (bottom) and with bias (top).

Moderator values are dramatically underestimated in context of publication bias.

Medium effects.

Without publication bias:

99% of metas detected the difference between δ = 0 and δ = 0.4.
60% of metas detected the difference between δ = 0 and δ = 0.2.
The Type I error rate in the δ = 0 group was 5.6%.
In the long run, effect sizes within each group were accurately recovered as d = 0, 0.2, and 0.4.

With publication bias:

Only 35% were able to detect the difference between δ = 0 and δ = 0.4.
Only 2.2% of the meta-analyses were able to detect the difference between δ = 0 and δ = 0.2,
100% of meta-analyses mistook the δ = 0 group as reflecting a significant effect.
Effect sizes within each group were overestimated: d = .46, .53, and .62 instead of δ = 0, 0.2, and 0.4.

Here's a plot of the moderator parameters across the 500 simulations without bias (bottom) and with bias (top).

Again, pub bias causes parameter estimates of the moderator to be biased downwards.

Conclusion:

Publication bias can hurt statistical power for your moderators. Obvious differences such as that between d = 0 and d = 0.6 may retain decent power, but power will fall dramatically for more modest differences such as that between d = 0 and d = 0.4. Meta-regression may be stymied by publication bias.

Monday, February 13, 2017

Why retractions are so slow

A few months ago, I had the opportunity to attend a symposium on research integrity. The timing was interesting because, on the same day, Retraction Watch ran a story on two retractions in my research area, the effects of violent media. Although one of these retractions had been quite swift, the other retraction had been three years in coming, which was a major source of heartache and frustration among all parties involved.

Insofar as some of us are concerned about the possible role of fraud as a contaminating influence in the scientific literature, I thought it might be helpful to share what I learned at the symposium. This regards the multiple steps and stakeholders in a retraction process, which may in part be the cause of common frustrations about the opacity and gradualness of the retraction process.

The Process

On paper, the process for handling concerns about a paper looks something like this:

Somebody points out the concerns about the legitimacy of an article.
The journal posts an expression of concern, summarizing the issues with the article.
If misconduct is suspected, the university investigates for possible malfeasance.
If malfeasance is discovered, the article is retracted.

We can see that it is an expression of concern can be posted quickly, whereas a retraction can take years of investigation. Because there is no way to resolve investigations faster, scientific self-correction can be expected to be slow. The exception to this is that, when the authors voluntarily withdraw an article in response to concerns, a retraction no longer requires an investigation.

Multiple stakeholders in investigations

Regarding investigations, it is not always clear what is being done or how seriously concerns are being addressed. In the Retraction Watch story at the top of the article, the plaintiffs spent about three years waiting for action on a data set with signs of tampering.

From the perspective of a scientist, one might wish for a system of retractions that acts swiftly and transparently. Through swiftness, the influence of fraudulent papers might be minimized, and through transparency, one might be appraised of the status of each concern.

Despite these goals, the accused has rights and must be considered innocent until found guilty. The accused, therefore, retains certain rights and protections. Because an ongoing investigation can harm one's reputation and career, oversight committees will not comment on the status or existence of an investigation.

Even when the accused is indeed guilty, they may recruit lawyers to apply legal pressure to universities, journals, or whistleblowers to avoid the career damage of a retraction. This can further complicate and frustrate scientific self-correction.

Should internal investigation really be necessary?

From a researcher's perspective, it's a shame that retraction seems to require a misconduct investigation. Such investigations are time-consuming. It is also difficult to prove intent absent some confession -- this may be why Diederik Stapel has 58 retractions, but only three of eight suspicious Jens Forster papers have been retracted.

Additionally, I'm not sure that a misconduct investigation is strictly necessary to find a paper worthy of retraction. When a paper's conclusions do not follow from the data, or the data are clearly mistaken, a speedy retraction would be nice.

Sometimes we are fortunate enough to see papers voluntarily withdrawn without a full-fledged investigation. Often this is possible only when there is some escape valve for blame: There is some honest mistake that can be offered up, or some collaborator can be offered as blameworthy. For example, this retraction could be lodged quickly because the data manipulation was performed by an unnamed graduate student. Imagine a different case where the PI was at fault -- it would have required years of investigation.

Summary

Whistleblowers are often upset that clearly suspicious papers are sometimes labeled only with an expression of concern. These frustrations are exacerbated by the opacity of investigations, in that it is often unclear whether there is an investigation at all, much less what progress has been made in the investigation.

Personally, I hope that journals will make effective use of expressions of concern as appropriate. I also appreciate the efforts of honest authors to voluntarily withdraw papers, as this allows for much
faster self-correction than would be possible if university investigation were necessary.

Unfortunately, detection of malfeasance will remain time-consuming and imperfect. Retraction is quick only when authors are either (1) honest and cooperative, issuing a voluntary withdrawal or (2) dishonest but with a guilty conscience, confessing quickly under scrutiny. However, science still has few tools against sophisticated and tenacious frauds with hefty legal war chests.

Header