Crystal Prison Zone: Fraud and Erroneous Judgment: Varieties of Deception in the Social Sciences (1995)

Killing time in the UChicago stacks in the summer of 2019, I found a book from 1995 called Fraud and Erroneous Judgment in the Social Sciences. It's been an interesting read, because despite having been written nearly 25 years ago, much of it reads like it was written today. Specifically, there is very little substance about actually preventing, detecting, or prosecuting fraud, presumably because all these things are very difficult to do.

Instead, a substantial portion of the text is dedicated to the easier task of fighting the culture war. Nearly half the book consists of polemics from scientists who think their ability to speak hard truths about sexual assault or intelligence or race or whatever has been suppressed by the bleeding hearts. This is particularly depressing and unhelpful when you see that two of the thirteen chapters are written by Linda Gottfredson and J. Phillippe Rushton, scientists receiving funding from the Pioneer Fund, an organization founded to study and promote eugenics.

Fraud...

For a text that is notionally about fraud, there is very little substance about actual fraud. Instead, most of the chapters are dedicated to the latter topic of "fallible judgment". Only three instances of research misconduct in psychology are discussed. Two of them appear in brief bullet points in the first chapter: In one, a psychologist fabricated data to demonstrate the efficacy of a drug for preventing self-harm in the mentally disabled; in the other, a researcher may have massaged his data to overstate the potential harms of low levels of lead exposure.

The third case consists of the allegations surrounding Cyril Burt. Cyril Burt was an early behavior geneticist. He argued that intelligence was heritable, and he demonstrated this through studies of the similarity of identical twins raised apart.

Burt was unpopular at the time because the view that intelligence was heritable sounded to many like Nazi ideology. While he was alive, people protested him as a far-right ideologue. (Other hereditarians experienced similar treatment; Hans Eysenck reportedly needed bodyguards as a result of his 1971 views that some of the Black-White intelligence gap was genetic in nature.)

Five years after his death, allegations arose that Burt had invented a number of his later samples. These allegations claimed that Burt, having found an initial sample that supported his hypothesis, and frustrated by the public resistance to his findings as well as the challenge of finding more identical twins raised apart, decided to help the process along by fabricating data from twin pairs. As evidence of this, his heritability coefficient remained .77 as the sample size increased from 15 twin pairs to 53 twin pairs. (Usually parameter estimates change a little bit as new data comes in.) He was further alleged to have made up two research assistants, but these assistants were later found. Complicating matters further, his housekeeper burnt all his research records shortly after his death (!) purportedly on the advice of one of Burt's scientific rivals (?!?).

Burt sounds like a real horse's ass. In a separate book, Cyril Burt: Fraud or Framed?, Hans Eysenck reports that Burt would sometimes sock-puppet, writing articles according to his own views, then leaving his name off of the work and handing it off to a junior researcher, giving the impression that some independent scholar shared his view. Burt purportedly went one further by editing articles submitted to his journal, inserting his own stances and invective into others' work and publishing it without their approval.

Two chapters in Fraud and Erroneous Judgment are devoted to the Burt affair. The first chapter, written by Robert B. Joynson, argues that, strictly speaking, you can't prove he committed fraud. Probably we will never know. Burt is dead and his records destroyed. Even if he made up the data, the potentially made-up data are at least consistent with what we believe today, so maybe it doesn't matter.

The other, written by the late J. Phillippe Rushton, one-time head of the Pioneer Fund, argues more stridently that Burt was framed. According to his perspective, the various social justice warriors and bleeding hearts of ~~today's~~ the 1970s' hyper-liberal universities couldn't bear the uncomfortable truths Burt preached. Rather than refute Burt's ideas in the arena of logic and facts and science, they resorted to underhanded callout-culture tactics to smear him after his death and spoil his legacy.

So in the only involved discussion of an actual fraud allegation in this 181-page book, all that can be said is "maybe he did, or maybe he didn't."

Some material is useful. Chapter 3 recognizes that scientific fraud is a human behavior that is motivated by, and performed within, a social system. One author theorizes that fraud is most often committed under three conditions: 1) there is pressure to publish, whether to advance one's career or to refute critics, 2) the researcher thinks they know the answer already, so that actually doing the experiment is unneccessary, and 3) the research area involves an amount of stochastic variability, such that a failure to replicate can be shaken off as Type I error or hidden moderators. It certainly sounds plausible, but I wonder how useful it is. Most research fulfills all three conditions: all of us are under pressure to publish, all of us have a theory or two to suggest a "right" answer, and all of us experience sampling error and meta-uncertainty.

One thing that hasn't changed one bit is that demonstrating fraud requires demonstrating intent, which is basically impossible. Then and now, people instead have to couch concerns in the language of error, presuming sloppiness instead of malfeasance. Even then, it's not clear at what level of sloppiness crosses the threshold between error and misconduct.

...and Erroneous Judgment

The other cases all concern "erroneous judgment". They reflect ideologically-biased interpretations of data, a lack of scientific rigor, or an excessive willingness to be fooled. These cases vary in their seriousness. At the extremely harmful end, there is a discussion of recovered-memory therapy; this therapy involves helping patients to recover memories of childhood abuse through a process indistinguishable from that one would use to create a false memory. Chillingly, recovered memories became permissible as court evidence in 15 states and lead to a number of false accusations and possible convictions during the Satanic Panic of the 1980s. At the less harmful end, there's an argument about whether the Greeks made up their culture by copying off of the Egyptians. Fun to think about maybe, but nobody is going to jail over that.

Other examples include overexaggeration of societal problems in order to drum up support for research and advocacy. Neil Gilbert illustrates how moral entrepreneurs can extrapolate from sloppy statistical work, small samples, and bad question wording to estimate that 100 billion children are abducted every 3.7 seconds. This fine example is, however, paired with a criticism of feminism and research on sexual assault that has aged poorly; the author's argument boils down to "c'mon, sexual assault can't be that common, right?" Maybe it can be, Neil.

According to the authors, these cases of fallible judgment are caused by excessive enthusiasm rather than deliberate intention to deceive. Therapists dealing in recovered memories are too excited to root out satanic child-abuse cults, too ignorant of the basic science of memory, and too dependent on the perceived efficacy of their practice to know better. Critics of the heritability of IQ are blinded by political correctness and "the egalitarian hoax" of blank-slate models of human development. Political correctness is cited as influencing "fallible judgments" as diverse as the removal of homosexuality from the DSM (and its polite replacement in diagnosis of other disorders so that homosexual patients could continue billing their insurance), the estimation of the prevalence of sexual harassment, failures to test and report racial differences in outcomes, or the attribution of the accomplishments of the Greeks to the Egyptians.

Again, it seems revealing that so little is known about actual cases of fraud that the vast majority of the volume is dedicated to cases where it is unclear who is right. Unable to discover and discuss actual frauds, the discussion has to focus instead on ideological opponents whom the authors don't trust to interpret and represent their data fairly.

Have we made progress?

What's changed between 1995 and now? Today we have more examples to draw upon and more forensic tools. We can use GRIM and SPRITE to catch what are either honest people making typographical mistakes or fraudsters too stupid to make up raw data (good luck telling which is which!). The Data Colada boys keep coming up with new tests for detecting suspicious patterns in data. It's become a little less weird to ask for data and a little more weird to refuse to share data. So there's progress.

Even so, we're still a billion miles away from being able to detect most fraud and to demonstrate intent. Demonstration of intent generally requires a confession or someone on the inside. Personally, I've suspect that fraud detection at scale is probably impossible unless we ask scientists to provide receipts. I can't imagine researchers going for another layer of bureaucracy like that.

One recurring theme is the absence of an actual science police. The discussion of the Burt affair complains that the Council of the British Psychological Society did little to examine Burt's case on its own, instead accepting the conclusions of a biographer. Chapters 1 and 2 discuss the political events that put "Science under Siege" and lead to the creation of the Office of Research Integrity, an institution only grudgingly accepted in Chapter 2. Huffing that every great scientist from Mendel to Millikan had to massage their data a bit from time to time to make their point, David Goodstein cautions the ORI, "I can only hope that we won't arrange things in such a way as would have inhibited Newton or Millikan from doing his thing."

Can we ever know the truth?

Earlier, I mentioned that the book contains three cases of purported fraud: the self-harm study, Cyril Burt's 38 twin-pairs raised apart, and the researcher possibly massaging his data to overestimate the harms of lead. This last case appears to be a reference to the late Herbert Needleman, accused in 1990 of p-hacking his model, an offense Newsweek described at the time as "like bringing a felony indictment for jaywalking." Needleman was exonerated in 1992, and the New York Times ran an obituary honoring him following his death in 2017.

Would I be impressed by Needleman's work today, or would I count him out as another garden-variety noise-miner looking for evidence to support a foregone conclusion? Maybe it doesn't matter. In the Newsweek article, the EPA is quoted as saying "We don't even use Needleman's study anymore" because subsequent research recommended even lower safety thresholds than did Needleman's controversial work. The tempest has blown over. The winners write their history, and the losers get paid by the Cato Institute to go on Fox News and argue against "lead hysteria".

There's a lot that hasn't changed

We think that science has only been subjective, partisan, and politicized in our current "war on science" post-2016 world, but the 1990s also had "science under siege" (Time, Aug 26, 1991) and intractable debates between competing groups with vested interests in there being a crisis or not being a crisis. The tobacco wars reappear in every decade.

Similarly, the froth and stupidity of daytime TV lives on in today's Daily Mail and Facebook groups. In the 90s, people with more outrage than sense believed in vast networks of underground Satanist cults that tortured children and "programmed" them to become pawns in their world domination scheme. Today, those people believe the Democratic party runs child trafficking ring through a pizza parlor and a furniture website and that Donald Trump is on a one-man mission to stop them.

Regarding fraud, we find that scientific self-policing only tends to emerge in response to crisis and scandal. NIH and NSF don't seem to have had formal recommendations regarding fraud until 1988; these were apparently motivated by pressure from Congress following the 1981 case of John Darsee, a Harvard cardiologist who had been faking his data. Those who do scientific self-policing aren't welcomed with open arms -- the book briefly stops to sneer at Walter Stewart and Ned Feder as "a kind of self-appointed truth squad. According to their critics, they had not been very productive scientists and were trying to find a way of holding on to their lab space." Nobody likes having fraud oversight, and everybody does the minimum possible to maintain public respectability until the scandal blows over.

Finally, each generation seems to suspect its successors of being fatally blinded by political correctness. This is clearest in the chapter dedicated to the defense of Cyril Burt, in which Rushton complains that academia will only become more corrupted by political correctness:

Today, the campus radicals of earlier decades are the tenured radicals of the 1990s. Some are chairmen, deans, and presidents. The 1960s mentality of peace, love, and above all equality now constitutes a significant portion of the intellectual establishment in the Western world. The equalitarian dogma is more, not less, entrenched than ever before. Yet, it is based on the scientific hoax of the century.

Will every generation of academics forever consider their successors insufferably and disreputably woke? Should they? It seems that, despite Rushton's concerns, the hereditarian perspective has won out in the end. Today we have researchers who not only recognize heritability, but have given careful thought to the meaning, causality, and societal implications of the research. I see this as tremendous progress when compared to the way the book tends to frame the debate over heritability, which invites the reader to choose between two equally misguided perspectives of either ignorant blank-slate idealism or Rushton's inhumane "race realism."

Summary

Some things have changed since 1995, but much has stayed the same.

Compared to 25 years ago, I think we have a better set of tools for detecting fraud. We have new statistical tricks and stronger community norms around data sharing and editorial action. We have the Office of Research Integrity and Retraction Watch.

But some things haven't changed. Researchers checking each other's work are still, at times, regarded coldly: the "self-appointed truth squad" of 1995 is the "self-appointed data police" of 2016. Demonstrating intent to deceive remains a very high bar for those investigating misconduct; probably some number of fraudsters escape oversight by claiming mere incompetence. Because it is difficult to prove intent to deceive, it's easier to fight culture war -- one can wave to an opponent's political bias without getting slapped with a libel suit. And we still don't know much about who commits fraud, why they commit fraud, and how we'll ever catch them.

Thursday, October 15, 2020

Fraud and Erroneous Judgment: Varieties of Deception in the Social Sciences (1995)