Crystal Prison Zone: Why does anybody mess with their data? Pt 2

"I saw the angel in the marble and carved until I set him free." - Michelangelo

I get the impression that some scientists' relationship with their data is something like Michelangelo and his angel. The question for this stripe of scientist is not whether there is a signal in the noise, but rather, how to expose the signal. The resulting data analysis goes something like this:

Discard all the subjects run in the last week of the semester. Now look at only the ones run in the one counterbalancing. Bin all the observations between 4 and 7 on your Likert scale. Do a median split on the independent variable -- no, better yet, model the interaction of the IV and gender. There it is! p = .054! Let's round this down to .05 and call it statistically significant.

You and I recognize that this sort of process completely invalidates the meaning of the p-value, that fraught, flimsy statistic that summarizes Fisher's concept of evidence. But to the sculptor, the significant p-value represents a truth discovered, and each data-dependent analysis brings one closer to finding that truth. Watching the p-value drop after discarding forty subjects feels like watching the marble fall away to reveal the face of an angel.

The conviction that there is a signal to be found is the saddest part of the process, to me. The scientific method demands that each hypothesis test be fair, that evidence could be found for the null as readily as it is found for the alternative. But the way the game is played, scientists will continue to think they need statistical significance to publish. It becomes very easy to convince yourself that there is a signal when your career depends on there being a signal.

So to answer my question from a few weeks ago, maybe people aren't deliberately fiddling with their data. Maybe they are just firmly convinced that the p-value means what they want it to. A p < .05 is waiting to be discovered in every dataset; there's an angel in every block of marble.

Header

Monday, February 22, 2016

Why does anybody mess with their data? Pt 2

No comments:

Post a Comment