Header

A psychologist's thoughts on how and why we play games

Wednesday, January 27, 2016

Power analysis slants funnels but doesn't flatten curves.

I recently had the pleasure of receiving a very thorough and careful peer review, with invitation to resubmit, at Psychological Bulletin for my manuscript-in-progress Overestimated Effects of Violent Games on Aggressive Outcomes in Anderson et al. (2010).

Although it was humbling to find I still have much to learn about meta-analysis, I was also grateful for what has been one of the most instructive peer reviews I have ever received. Sometimes one gets peer reviewers who simply don't like your paper, and perhaps never will. Sometimes reviewers can only offer the most nebulous of suggestions, leaving you fumbling for a way to appease everyone. This review, however, was full of blessedly concrete recommendations.

Anyway, the central thrust of my paper is that Anderson et al. (2010) assert that violent-game effects are not overestimated through publication, analytic, or other biases. However, they did not provide funnel plots to support this argument, relying chiefly on the trim-and-fill procedure instead. When you generate these funnel plots, you find that they are strikingly asymmetrical, suggesting that there may indeed be bias despite the trim-and-fill results.

Effects of violent games on aggressive behavior in experiments selected as having "best-practices" methodology by Anderson et al. (2010).

We conducted two new statistical adjustments for bias, which suggest that the effect may actually be quite small. One, PET, uses the funnel plot's asymmetry to estimate what the effect size might be for a hypothetical perfectly-precise study. The other, p-curve, uses the p-values of significant results to estimate the underlying effect size.

One peer reviewer commented that, if the meta-analyzed effect sizes are heterogeneous, with some large and some small, and that if researchers are using power analysis appropriately to plan their effect sizes, then true large effects will be studied with small samples and true small effects will be studied with large samples, leading to an asymmetrical funnel plot and the illusion of research bias.

I don't think that's what's going on here, of course. Power analysis is rare in social psychology, especially in the years covered by the Anderson et al. meta-analysis. I'm also not sure how researchers would somehow know, a priori, the effect size they were studying, but then Arina K. Bones does have ironclad evidence of the precognitive abilities of social psychologists.

But even if that were true, I had the hunch it should only affect the funnel plot, which relies on observed effect sizes and sample sizes, and not the p-curve, which relies on the statistical power of studies. So I ran a simulation to see.

Simulated meta-analysis of 1000 studies. True effect size varies uniformly between .1 and .5. Sample sizes selected for 80% one-tailed power. Simulation code at bottom of post.

Sure enough, the funnel plot is very asymmetrical despite the inclusion of all studies. However, the p-curve still shows a clear right skew.

Of course, the meta-analyst should take efforts to divide the studies into homogenous groups so that heterogeneity is minimized and one is not comparing apples and oranges. But I was compelled to test this and then further compelled to write it down.


Code here:

# power analysis hypothesis:
# Reviewer 3 says:
  # if the size of studies is chosen according to properly 
  # executed power analyses, we would in fact expect to see 
  # an inverse relationship between outcomes and sample sizes 
  # (and so if authors engage in the recommended practice of 
  # planning a study to achieve sufficient power, we are actually 
  # building small-study effects into our literature!). 

# Let's simulate b/c I don't believe p-curve would work that way.

library(pwr)
library(metafor)

# lookup table for power
d = seq(.1, .6, .01) #seq(.1, 1, .05)
n = NULL
for (i in 1:length(d)) {
  n[i] = pwr.t.test(d = d[i], sig.level = .05, power = .8, 
                 type = "two.sample", alternative = "greater")$n
}
# round up b/c can't have fractional n
n = ceiling(n)

# pick a d, pick an n, run an experiment
simLength = 1e3
d_iter = NULL
n_iter = NULL
df_iter = NULL
t_iter = NULL
for (i in 1:simLength) {
  index = sample(1:length(d), 1)
  d_iter[i] = d[index]
  n_iter[i] = n[index]
  df_iter[i] = n_iter[i] - 2
  t_iter[i] = rt(1, df_iter[i], 
                 ncp = d_iter[i] / (sqrt(1/floor(n_iter[i]) + 1/ceiling(n_iter[i])))
  )
}

dat = data.frame(d_true = d_iter,
                 n = n_iter, df = df_iter, t = t_iter)
dat$d_obs = 2*dat$t/sqrt(dat$df)
dat$p = pt(dat$t, dat$df, lower.tail = F)
dat$se_obs = sqrt(
  (dat$n/((dat$n/2)^2)+dat$d_obs^2/(2*dat$df))*(dat$n/dat$df)
)

# funnel plot
model = rma(yi = d_obs, sei = se_obs, data = dat)

#p-curve
hist(dat$p)

par(mfrow=c(1, 2))
funnel(model, main = "Funnel plot w/ \npower analysis & \nheterogeneity")
hist(dat$p[dat$p<.05], main = "p-curve w/ \npower analysis & \nheterogeneity", xlab = "p-value")

No comments:

Post a Comment