Crystal Prison Zone: February 2016

Monday, February 29, 2016

The Hemingway App

The goal of peer review is to check the accuracy of the paper's argument. However, I often have so much trouble reading the paper that I can't figure out what the arguments are, much less check their accuracy. This is frustrating. It makes me spend a lot of time re-reading papers.

I've been thinking about clarity in writing ever since my sister linked me to Hemingway App. It's an online application that judges the readability of a passage. It also suggests where sentences may be too long or involved. Simply editing the demonstration passage was a small epiphany for me, as I realized that sentences could be short.

If you write scientific papers, and if you hope others will read them, I suggest you take a look at Hemingway. You might think twice before writing a sentence that spans four lines.

(Hemingway rates this post as at a 9th grade reading level. Experts recommend writing for the public at a 9th grade level at most. The average @real_peerreview abstract seems to rate at a 20th grade reading level.)

Thursday, February 25, 2016

Sick and Tired of Bias-Correction

Rick Deckard, wishing he could've stuck to experimental research. (Blade Runner, 1982)

At present, a tremendous amount of work is going into the development, refinement, and application of tools for the detection of, and correction for, research bias. We now have funnel plots, meta-regression, trim-and-fill, p-curve, p-uniform, selection models, Bayesian selection models, R-index, the Test for Excess Significance, and the Test for Insufficient Variance, to name only a few.

I like these tests because they've helped me to question some research findings that I don't think are quite right. But at the same time, the thought of having to spend the rest of my life performing meta-analyses and using these tools is deeply exhausting to me. Dr. R seems to enjoy what he's doing, but the idea of churning out test after test after test for all eternity does not seem very pleasant.

It's tedious because these tests are all imperfect. Each has its own assumptions of how the data-gathering process works, and those assumptions are often woefully mistaken. Each performs well in a sample of about, oh, 500 perfectly homogeneous studies, which is a problem when most social psychology literatures can muster about 30 studies of questionable homogeneity.

These tests' results are helpful, but they'll never recover the actual raw data. No Egger test can reveal that someone's dissertation started out as N = 410 in a 2 × 2 × 4 design with two outcomes and worked its way down to N = 140 in a 2 × 3 design with one outcome. You'll never recover the actual research findings.

It's for this reason I think that scientific reform is far, far more important than any amount of meta-analytic data-sleuthing. Data-sleuthing lets you rescue hypotheses when you're already 10 or 20 years deep in a research program, maybe improving your estimates by 25-50% or so. But transparent and principled science is the only way to get it right the first time.

So if you ask me, I'd rather everybody played fair from the start. Playing detective is a lousy way to spend one's time. Let's please publish our null results, all our outcomes, our full datasets. Let's abstain from the opportunism of advocacy or the craven greed of hunting for effects to claim as our own. Let's instead assume the impartial and disinterested stance expected of scientists. Papers are so much more fun to write when they're honest, and so much more fun to read, too.

In the meantime, the meta-analyses will probably have to continue until everybody is sufficiently embarrassed that we agree to embrace reform. I hope it won't take much longer.

Monday, February 22, 2016

Why does anybody mess with their data? Pt 2

"I saw the angel in the marble and carved until I set him free." - Michelangelo

I get the impression that some scientists' relationship with their data is something like Michelangelo and his angel. The question for this stripe of scientist is not whether there is a signal in the noise, but rather, how to expose the signal. The resulting data analysis goes something like this:

Discard all the subjects run in the last week of the semester. Now look at only the ones run in the one counterbalancing. Bin all the observations between 4 and 7 on your Likert scale. Do a median split on the independent variable -- no, better yet, model the interaction of the IV and gender. There it is! p = .054! Let's round this down to .05 and call it statistically significant.

You and I recognize that this sort of process completely invalidates the meaning of the p-value, that fraught, flimsy statistic that summarizes Fisher's concept of evidence. But to the sculptor, the significant p-value represents a truth discovered, and each data-dependent analysis brings one closer to finding that truth. Watching the p-value drop after discarding forty subjects feels like watching the marble fall away to reveal the face of an angel.

The conviction that there is a signal to be found is the saddest part of the process, to me. The scientific method demands that each hypothesis test be fair, that evidence could be found for the null as readily as it is found for the alternative. But the way the game is played, scientists will continue to think they need statistical significance to publish. It becomes very easy to convince yourself that there is a signal when your career depends on there being a signal.

So to answer my question from a few weeks ago, maybe people aren't deliberately fiddling with their data. Maybe they are just firmly convinced that the p-value means what they want it to. A p < .05 is waiting to be discovered in every dataset; there's an angel in every block of marble.

Tuesday, February 9, 2016

Why does anybody mess with their data?

A few weeks ago, I was listening to a bit of point/counterpoint on the Mother Jones Inquiring Minds podcast. On one episode, Brad Bushman gave an interview about the causes of gun violence, emphasizing the Weapons Priming Effect and the effects of violent video games. (Apparently he and his co-authors have a new meta-analysis of the Weapons Priming Effect; I can't read it because it's still under revision and the authors have not sent me a copy.)

On the other, Inquiring Minds invited violent-media-effect skeptic Chris Ferguson, perhaps one of Bushman's most persistent detractors. Ferguson recounted all the reasons he has for skepticism of violent-game effects, some reasonable, some less reasonable. One of his more reasonable criticisms is that he's concerned about publication bias and p-hacking in the literature. Perhaps researchers are running several studies and only reporting the ones that find significance, or maybe researchers take their null results and wiggle them around until they reach significance. (I think this is happening to some degree in this literature.)

Surprisingly, this was the criticism that drew the most scoffing from the hosts. University scientists don't earn anything, they argued, so who in their right mind would go into science and twist their results in hope of grant funding? Anyone wanting to make money would have an easier time of it staying far away from academia and going into something more lucrative, like dog walking.

Clearly, the hosts are mistaken, because we know that research fraud happens, publication bias happens, and p-hacking happens. Andrew Gelman's blog today suggests that these things happen when researchers find themselves chasing null hypotheses: due to publish-or-perish pressures, researchers have to find statistical significance. But why does anybody bother?

If the choice is between publishing nonsense and "perishing" (e.g., leaving academia to take a significant pay raise at a real job), why don't we see more researchers choosing to perish?

Header