Header


Tuesday, January 26, 2021

I tried to report scientific misconduct. How did it go?

This is the story of how I found what I believe to be scientific misconduct and what happened when I reported it.

Science is supposed to be self-correcting. To test whether science is indeed self-correcting, I tried reporting this misconduct via several mechanisms of scientific self-correction. The results have shown me that psychological science is largely defenseless against unreliable data.

I want to share this story with you so that you understand a few things. You should understand that there are probably a few people in your field producing work that is either fraudulent or so erroneous it may as well be fraudulent. You should understand that their work is cited in policy statements and included in meta-analyses. You should understand that, if you want to see the data or to report concerns, those things happen according to the inclinations of the editor-in-chief at the journal. You should understand that if the editor-in-chief is not inclined to help you, they generally not accountable to anyone and they can always ignore you until the statute of limitations runs out.

Basically, it is very easy to generate unreliable data, and it is very difficult to get it retracted.

Qian Zhang

Two years ago, I read a journal article that appeared to have gibberish for all its statistics (Zhang, Espelage, & Zhang, 2018). None of the numbers in the tables added up: the values didn't match the values, the values didn't match the means and SDs, and the degrees of freedom didn't match the sample size. This was distressing because the sample size was a formidable 3,000 participants. If these numbers were wrong, they were going to receive a lot of weight in future meta-analyses. I sent the editor a note saying "Hey, none of these numbers make sense." The editor said they'd ask the authors to correct, and I moved on with my life.

 


Figure 1. Table from Zhang, Espelage, & Zhang, (2018). The means and SDs don’t make sense, and the significance asterisks are incorrect given the F values.

Then I read the rest of Dr. Zhang's first-authored articles and realized there was a broader, more serious problem – one that I am still spending time and energy trying to clean up, two years later.

 

Problems in Qian Zhang’s articles

Zhang’s papers would often report impossible statistics. Many papers had subgroup means that could not be combined to yield the grand mean. For example, one paper reported mean task scores of 8.98ms and 6.01ms for males and females, respectively, but a grand mean task score of 23ms.

Other papers had means and SDs that were impossible given the range. For example, one study reported a sample of 3,000 children with ages ranging from 10 to 20 years (M = 15.76, SD = 1.18), of which 1,506 were between ages 10 and 14 and 1,494 were between ages 15 and 20. If you put those numbers into SPRITE, you will find that, to meet the reported mean and SD of age, all the participants must be between the ages of 14 and 19, and only about 500 participants could be age 14.

More seriously still, tables of statistical output seemed to be recycled from paper to paper. Two different articles describing two different experiments on two different populations would come up with very similar cell means and F values. Even if one runs exactly the same experiment twice, sampling error means that the odds of getting all six cells of a 2 × 3 design to come up again within a few decimal points are quite low. The odds of getting them on an entirely different experiment years later in a different population would be smaller still.

As an example, consider this table, published in Zhang, Espelage, and Rost (2018)Youth and Society (Panel A)in which 2,000 children (4th-6th grade) perform a two-color emotion Stroop task. The means and F values closely match the same values as a sample of 74 high schoolers (Zhang, Xiong, & Tian, 2013Scientific Research: Health, Panel B) and a sample of 190 high schoolers (Zhang, Zhang, & Wang, 2013Scientific Research: Psychology, Panel C).



Figure 2. Three highly similar tables from three different experiments by Zhang and colleagues. The degree of similarity for all nine values of the table is suspiciously high.


Dr. Zhang publishes some corrigenda 

After my first quick note to Youth and Society that Zhang’s p values didn't match the F values, Dr. Zhang started submitting corrections to journals. What was remarkable about these corrections is that they would simply add an integer to the F values so that they would be statistically significant.

Consider, for example, this correction at Personality and Individual Differences (Zhang, Tian, Cao, Zhang, & Rodkin, 2016):


Figure 3. An uninterpretable ANOVA table is corrected by the addition or subtraction of an integer value from its F statistics.

The correction just adds 2 or 3 onto the nonsignificant values to make them match their asterisks, and it subtracts 5 from the significant F value to make it match its lack of asterisks.


Or this correction to 
Zhang, Espelage, and Zhang (2018)Youth and Society, now retracted:

 


Figure 4. Nonsignificant F values become statistically significant through the addition of a tens digit. Note that these should now have three asterisks rather than one and two, respectively.

Importantly, none of the other summary or inferential statistics had to be changed in these corrigenda, as one might expect if there was an error in analysis. Instead, it was a simple matter of clobbering the F values so that they’d match the significance asterisks.


Asking for raw data

While I was investigating Zhang’s work from 2018 and earlier, he published another massive 3,000-participant experiment in Aggressive Behavior (Zhang et al., 2019). Given the general sketchiness of the reports, I was getting anxious about the incredible volume of data Zhang was publishing. 

I asked Dr. Zhang if I could see the data from these studies to try to understand what had happened. He refused, saying only the study team could see the data. 

So, I decided I’d ask the study team. I asked Zhang’s American co-author if they had seen the data. They said they hadn't. I suggested they ask for the data. They said Zhang refused. I asked them if they thought that was odd. They said, no, "It's a China thing."

 

Reporting Misconduct to the Institution

Given the recycling of tables across studies, the impossible statistics, the massive sample sizes, the secrecy around the data, and the corrigenda which had simply bumped the F values into significance, I suspected I had found research misconduct.  In May 2019, I wrote up a report and sent it to the Chairman of the Academic Committee at his institution, Southwest University Chongqing. You can read that report here.

A month later, I was surprised to get an email from Dr. Zhang. It was the raw data from the Youth & Society article I had previously asked for and been refused.

Looking at the raw data revealed a host of suspicious issues. For starters, participants were supposed to be randomly assigned to movie, but girls and students with high trait aggression were dramatically more likely to be assigned to the nonviolent movie. 

There was something else about the reaction time data that is a little more technical but very serious. Basically, reaction time data on a task like the Stroop should show within-subject effects (some conditions have faster RTs than others) and between-subject effects (some people are faster than others). Consequently, even an incongruent trial from Quick Draw McGraw could be faster than a congruent trial from Slowpoke Steven.

Because of these between-subject effects, there should be a correlation between a subject’s reaction times in one condition and their reaction times in the other. If you look at color-Stroop data I grabbed from a reliable source on the OSF, you can see that correlation is very strong. 


Figure 5. The correlation between subjects' mean congruent-word RT and mean incongruent-word RT in a color-word Stroop task. Data from Lin, Inzlicht, Saunders, & Friese (2019).

If you look at Zhang’s data, you see the correlation is completely absent. You might also notice that the distribution of subjects’ means is weirdly boxy, unlike the normal or log-normal distribution you might expect.

Figure 6. The correlation between subjects' mean aggressive-word RT and nonaggressive-word RT in an aggressive-emotion Stroop task. Data from Zhang, Espelage, and Rost (2018). The distribution of averages is odd, and the correlation unusually weak.

There was no way the study was randomized, and there was no way that the study data was reliable Stroop data. I wrote an additional letter to the institution detailing these oddities. You can read that additional letter here.

A month after that, Southwest University cleared Dr. Zhang of all charges.

The letter I received declared: "Dr. Zhang Qian was deficient in statistical knowledge and research methods, yet there is insufficient evidence to prove that data fraud [sic]." It explained that Dr. Zhang was just very, very bad at statistics and would be receiving remedial training and writing some corrigenda. The letter noted that, as I had pointed out, the ANOVA tables were gibberish and the degrees of freedom did not match the reported sample sizes. It also noted that the "description of the procedure and the object of study lacks logicality, and there is a suspicion of contradiction in the procedure and inconsistency in the sample," whatever that means.

However, the letter did not comment on the strongest pieces of evidence for misconduct: the recycled tables, the impossible statistics, and the unrealistic properties of the raw data. I pressed the Chairman for comment on these issues. 

After four months, the Chairman replied that the two experts they consulted determined that "these discussions belong to academic disputes." I asked to see the report from the experts. I did not receive a reply.

 

Reporting Misconduct to the Journals

The institution being unwilling to fix anything, I decided to approach the journals. In September and October 2019, I sent each journal a description of the problems in the specific article each had published, as well as a description of the broader evidence for misconduct across articles. 

I hoped that these letters would inspire some swift retractions, or at least, expressions of concern. I would be disappointed.

Some journals appeared to make good-faith attempts to investigate and retract. Other journals have been less helpful.


The Good Journals

Youth and Society reacted the most swiftly, retracting both articles two months later

Personality and Individual Differences took 10 months to decide to retract. In July 2020, the editor showed me a retraction notice for the article. I am still waiting for the retraction notice to be published. It was apparently lost when changing journal managers; once recovered, it then had to be sent to the authors and publisher for another round of edits and approvals.

Computers in Human Behavior is still investigating. The editor received my concerns with an appropriate degree of attention, but it seems there was some confusion about whether the editor or the publisher is supposed to investigate that has slowed down the process.

I felt these journals generally did their best, and the slowness of the process likely comes from the bureaucracy of the process and the inexperience editors have with that process. Other journals, I felt, did not make such an attempt.


Aggressive Behavior

In October 2019, Zhang sent me the data from his Aggressive Behavior article. I found the data had the same bizarre features that I had found when I received the raw data from Zhang's now-retracted Youth and Society article. I wrote a letter detailing my concerns and sent it to Aggressive Behavior's editor in chief, Craig Anderson. 

The letter, which you can read here, detailed four concerns. One was about the plausibility of the average Stroop effect reported, which was very large. Another was about failures of random assignment: chi-squared tests found the randomly-assigned conditions differed in sex and trait aggression, with p values of less than one in a trillion. The other two concerns regarded the properties of the raw data.

It took three months and two emails to the full editorial board to receive acknowledgement of my letter. Another four months after that, the journal notified me that it would investigate. 

Now, fifteen months after the submission of my complaint, the journal has made the disappointing decision to correct the article. The correction explains away the failures of randomization as an error in translation; the authors now claim that they let participants self-select their condition. This is difficult for me to believe. The original article’s stressed multiple times its use of random assignment and described the design as a "true experiment.” They also had perfectly equal samples per condition ("n = 1,524 students watched a 'violent' cartoon and n = 1,524 students watched a 'nonviolent' cartoon.") which is exceedingly unlikely to happen without random assignment. 

The correction does not mention the multiple suspicious features of the raw data. 

This correction has done little to assuage my concerns. I feel it is closer to a cover-up. I will express my displeasure with the process at Aggressive Behavior in greater detail in a future post.

 

Zhang’s newest papers

Since I started contacting journals, Zhang has published four new journal articles and one ResearchSquare preprint. I also served as a peer reviewer on two of his other submissions: One was rejected, and the other Zhang withdrew when I repeatedly requested raw data and materials.

These newest papers all carefully avoid the causes of my previous complaints. I had complained it was unlikely that Zhang should collect 3,000 subjects every experiment; the sample sizes in the new studies range from 174 to 480. I had complained that the distribution of aggressive-trial and nonaggressive-trial RTs within a subject didn’t make sense; the new studies analyze and present only the aggressive-trial RTs, or they report a measure that does not require RTs.

Two papers include a public dataset as part of the online supplement, but the datasets contain only the aggressive-trial RTs. When I contacted Zhang, he refused to share the nonaggressive-trial RTs. He has also refused to share the accuracy data for any trials. This might be a strategy to avoid tough questions about the kind of issues I found in his Youth & Society and Aggressive Behavior articles. 

Because Zhang refused me access to the data, I had to try asking the editors at those journals to enforce the APA Code of Ethics section 8.14 which requires sharing of data for the purpose of verifying results.

At Journal of Experimental Child Psychology, I asked editor-in-chief David Bjorklund to intervene. Dr. Bjorklund has asked Dr. Zhang to provide the requested data. I thank him for upholding the Code of Ethics. A month and half have passed since Dr. Bjorklund's intervention, and I yet to receive the requested data and materials from Dr. Zhang.

At Children and Youth Services Review, I asked editor-in-chief Duncan Lindsey to intervene. Zhang claimed that the data consisted only of aggressive-trial RTs, and that he could not share the program because it “contained many private information of children and had copyrights.”

I explained my case to Lindsey. Lindsey sent me nine words — "You will need to solve this with the authors." — and never replied again.

Dr. Lindsey's failure to uphold the Code of Ethics at his journal is shameful. Scholars should be aware that Children and Youth Services Review has chosen not to enforce data-sharing standards, and research published in Children and Youth Services Review cannot be verified through inspection of the raw data.

I have not yet asked for the data behind Zhang’s new articles in Cyberpsychology, Behavior, and Social Networking or Journal of Aggression, Maltreatment, & Trauma.


Summary

I was curious to see how the self-correcting mechanisms of science would respond to what seemed to me a rather obvious case of unreliable data and possible research misconduct. It turns out Brandolini’s Law still holds: “The amount of energy needed to refute bullshit is an order of magnitude larger than to produce it.” However, I was not prepared to be resisted and hindered by the self-correcting institutions of science itself.

I was disappointed by the response from Southwest University. Their verdict has protected Zhang and enabled him to continue publishing suspicious research at great pace. However, this result does not seem particularly surprising given universities' general unwillingness to investigate their own and China's general eagerness to clear researchers of fraud charges.

I have also generally been disappointed by the response from journals. It turns out that a swift two-month process like the one at Youth and Society is the exception, not the norm.

In the cases that an editor in chief has been willing to act, the process has been very slow, moving only in fits and starts. I have read before that editors and journals have very little time or resources to investigate even a single case of misconduct. It is clear to me that the publishing system is not ready to handle misconduct at scale.

In the cases that an editor in chief has been unwilling to act, there is little room for appeal. Editors can act busy and ignore a complainant, and they can get indignant if one tries to go around them to the rest of the editorial board. It is not clear who would hold the editors accountable, or how. I have little leverage over Craig Anderson or Duncan Lindsey besides my ability to bad-mouth them and their journals in this report. At best, they might retire in another year or two and I could have a fresh editor with whom to plead my case.

The clearest consequence of my actions has been that Zhang has gotten better at publishing. Every time I reported an irregularity with his data, his next article would not feature that irregularity. In essence, each technique for pointing out the implausibility of the data can be used only once, because an editor’s or university’s investigation consists of showing the authors all the irregularities and asking for benign explanations. This is a serious problem when even weak explanations like “I didn’t understand what randomized assignment means” or “I’m just very bad at statistics” are considered acceptable.

Zhang has reported experiments with sample sizes totaling to more than 11,000 participants (8,000 given the Aggressive Behavior correction). This is an amount of data that rivals entire meta-analyses and ManyLabs projects. If this data is flawed, it will have serious consequences for reviews and meta-analyses.

In total, trying to get these papers retracted has been much more difficult, and rather less rewarding, than I had expected. The experience has led me to despair for the quality and integrity of our science. If data this suspicious can’t get a swift retraction, it must be impossible to catch a fraud equipped with skills, funding, or social connections.

50 comments:

  1. None of this is remotely surprising. I sometimes wonder if editors are selected for their lack of willingness to rock the boat, or if they are just typical of most working scientists who, it seems, are happy to collude, intentionally or not and by commission or omission, with perpetrators of fraud.

    ReplyDelete
  2. I'm just mind-boggled that these papers got through peer-review

    ReplyDelete
  3. You're doing the lords work. I can't imagine how freestanding, demotivating, and tedious the effort is. Thanks for the dedication!

    ReplyDelete
  4. A huge thank you to you for really trying here, and even for questioning data in the first place. If the world had more people like you we'd be in a much better place.

    ReplyDelete
  5. Thanks man! Journals seem bad overall (due to bad incentives?). Would be interesting in the future to have AI analyze openly published articles in a more stringent format. The sheer amount of lousy research slipping through the cracks is stupendous. https://www.youtube.com/watch?v=kVk9a5Jcd1k

    ReplyDelete
  6. Thanks for putting up a fight, however futile editors make it seem.

    ReplyDelete
  7. There really needs to be some ranking or quality judgement of journals, so they have more incentive to take these issues seriously.

    ReplyDelete
  8. Second the thanks.

    I have to say. We really do have to make use of the one power we actually have which is the review process. When called upon to review a paper, if there is any doubt, then shoot it down...

    I have seen the exact same behaviour in the review process where people are using the feedback to just get better at playing the system. Even to absurd degrees like removing all substantial claims in the conclusions and still expecting publication!

    ReplyDelete
  9. Great work the universe is better for your efforts. This is being discussed over at hacker news https://news.ycombinator.com/item?id=25922799

    ReplyDelete
  10. Brandolini’s Law also means we need 10x more people like you!! Thank you!

    ReplyDelete
  11. This makes me fear what is happening in more politicalized fields.

    ReplyDelete
  12. This is amazing work. Quite terrifying the implications you point out. Policies and academic courses will be influenced incorrectly by these papers. Worse yet is the deterioration of the credibility of science overall. When it becomes this tedious to find and fight one fraud, the willpower to read and trust other articles corrodes.

    ReplyDelete
  13. Because I am aware of the integrity of Dr. John Lash, I do not doubt that this expose was done precisely to unearth these kind of articles that are driven by the "publish or perish"syndrome and for getting grant money. As a reviewer, based on this investigation, you can be sure that I will be much more careful when evaluating reports whose designs and statistical tests are not clear and may hide these kind of false claims.

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. Can we now just admit that peer-review == self-review?

    perhaps add it to the browser substitution list?
    https://xkcd.com/1288/

    Too cynical?

    ReplyDelete
  16. Why don't you just go out and say this Dr. Zhang is a fraud and has simply fabricated his research? It's pretty clear that is the case.

    ReplyDelete
    Replies
    1. What would that accomplish? Everyone who reads this blog can clearly see that Zhang is a fraud and his institution is garbage for defending him. The main objective is to remove his fraudulent data from the literature.

      Delete
  17. Thanks for this work. Wish I could help in any way

    ReplyDelete
  18. It's deeply upsetting that Southwest University is doing nothing to punish Zhang, or at a bare minimum, create some guardrails for his future publishing. However, I think you're being rather hard on the editors of these journals. These are all super niche publications, largely without full time editors--we're not talking about Nature here. Demanding aggressive timelines for what is sensitive and needs to be done with care from people whose main job is being a researcher and lecturer is asking quite a bit.

    ReplyDelete
    Replies
    1. Personality and Individual Differences is all but niche and is pretty used to editor crap. They publish several ethically/scientifically infuriating stuff without a problem. They were those that published the race science bullshit about skin pigmentation and agressiveness.

      Delete
    2. Though in fairness they did take action when this was brought to their attention!

      Delete
  19. Thanks for doing all of this work...it's much needed. I suggest reaching out to RetractionWatch, as they may be able to give you a bigger podium that could help shame people into action.

    I also think reaching out to Espelage's institution could spur some action.

    ReplyDelete
  20. This looks like at least two orders of magnitude effort, I conclude that there's a second pile of bullshit concealed beneath the stench of the first.

    ReplyDelete
  21. If this is the way that Southwest University Chongqing responded, it sounds as if no research from anyone at that institution should be trusted.

    ReplyDelete
  22. This is why I teach my students to not believe in everything they read, just because it is a peer review journal. Thank you.

    ReplyDelete
  23. Thank you for standing for your (our) principles. Keep up the good work. Look out for preprints that cite Zhang's work - especially meta-analysis manuscripts. Perhaps you can stop the damage before it happens.

    ReplyDelete
  24. We need a revolutionary movement against the inertia and disengagement of the editors and publishers.

    It's clear that many are simply mercenaries, with little regard for the integrity of the public resource that they claim to be stewards of.

    ReplyDelete
  25. No surprise. Even the big names: https://youtu.be/b-eTDKRB3Vg

    ReplyDelete
  26. Peer review is basically dead as a concept, and there's a reason half of the population basically laughs when someone says "follow the science."

    Where? To the imgainary garbage heap it originated from?

    ReplyDelete
  27. I had complained to a few journals about scientific miscounduct in peer review. A few articles were reviewed in the same laboratory from where they were submitted and it happened in front of me. However, none of a journal has taken any action.

    ReplyDelete
  28. Amazing article, but I recommend you to not send all the observed problems in a single message? The way you did was really interesting, informative and respectful, but I feel like there's a technique that matches the energy (effort) of whoever replies and makes the degree of scientific misconduct much clearer.

    One point supporting that is that verbal communication (and a debate) is linear: Someone proclaims a contradiction, the attacked person responds, then that someone proclaims another (or attacks the response), then the attacked person responds, etc.

    When you send a message saying "There are 7 facts that contradict the paper and they are this and that", you are launching a single argument and in the perspective of the Chairman it translates to: "The paper is flawed because of this single, atomic problem I noticed in the data: blah blah blah"

    It might take much longer as each message will only address one problem, but by sending these problems one at a time and waiting a response it becomes clear that the paper does not have one "mistake", but it is entirely a mistake.

    There's also an additional pressure because after the fourth or fifth problem you communicate, the person is either tired of you saying the paper has flaws (with arguments that need time to deconstruct/make up lies), or he is forced to say "I don't care, just let me publish falsehood in peace!"

    Anyways, it might be too difficult to implement, just in case you ever come across terrible cases of scientific misconduct that you want to put a large effort into being retracted.

    ReplyDelete
    Replies
    1. Hi Unknown,

      It is my impression that editors, institutions, committees, etc. all investigate only once, if ever. It seems to me that gradually dribbling out the evidence over a series of investigations might be less effective -- people will grow exasperated with you and start ignoring you rather than launching a fourth investigation. It also gives people more time to cover their tracks, for the data to be lost, for somebody to change institutions, or for somebody to stop replying to emails.

      In another case I've handled, the editor was very firm that the results of the first investigation were final and that he considered any further inquiries to be "harassment."

      For those reasons, I think it is best to put all the evidence in a single package. In this case, I wanted to make my best good-faith summary of all the evidence for misconduct. Southwest University's decision not to consider all the relevant interest would probably be the same regardless of my communication strategy.

      Delete
    2. I had not considered this and it makes sense, thanks.

      Delete
  29. This comment has been removed by the author.

    ReplyDelete
  30. Hello, Joe!
    I have sent you an email and hope to have a brief interview with you on this issue. I am looking forward to your reply. thank you.

    ReplyDelete
  31. Hello, Joe!
    I have sent you an email yesterday and hope to have a brief interview with you on this issue. I am looking forward to your reply. thank you.

    ReplyDelete
    Replies
    1. Hi Wang, I replied to your gmail account at Feb 2, 7 AM Beijing time.

      Delete
    2. Hello,Joe!
      I checked my email again and found that I didn't receive your reply. I think it may be due to network problems or other reasons. Sorry for wasting your time! I will send it to you again, thank you for your cooperation! Good luck!

      Delete
    3. This comment has been removed by a blog administrator.

      Delete
  32. This comment has been removed by the author.

    ReplyDelete
  33. Now that you have started down this path, devote time securing the funding to found a institution dedicated to the promotion of integrity and transparency in scientific publication.

    ReplyDelete
  34. Hopefully something can be done since it is already on Science! And have you by any chance checked out his new publications?

    ReplyDelete
  35. Thanks alot for this blogpost and thanks even more for your hard work. Being a scientist means towards my opinion that you care about unreliable parts of what's often called 'the body of scientific knowledge'. Such parts need to be removed (= retracted) when it indeed turns out that there is preponderance of evidence that they are rotten. Its very hard work and only possible for people who are very tough. Please continu with you work and do not get disappointed when you get no response.

    I am working together with others to get retracted a fraudulent study on the breeding biology of the Basra Reed Warbler in a Taylor & Francis journal. Max Kasparek, the Editor-in-Chief of this journal, has not communicated with me since the beginning of May 2015. Publisher Taylor & Francis has blocked my e-mail account and told me to block all other e-mail accounts as well (This as a response on a formal request, round 2, on access to the raw research data). I fail to understand why Taylor & Francis is forcing me to use snailmail when communicating with them.

    The details of our efforts are listed in a preprint at https://www.researchgate.net/publication/344770291 A slightly revised version is at the moment at journal #34. Some of the responses from editors of journals are listed in the preprint. Anyone any idea why Pippa Smart (mentioned in my preprint) does not communicate with me about her article at https://ese.arphahub.com/article/52201/ and anyone any idea why the Editor-in-Chief of this journal does not communicate with me as well?

    I won't go into the details of the outcome of the correspondence with others about our efforts to get retracted this fraudulent study and I won't speculate about the motives of the non-response of many parties on queries from my side.

    ReplyDelete
  36. Overall great article.

    Just one point: Regarding "Other papers had means and SDs that were impossible given the range." I don't quite agree with this, or at least I think this point needs more convincing arguments. Not sure what SPRITE is doing, but I am sure that there are a lot of distributions with the given mean/standard deviation/constraints, even some that include children aged 10, 11, and so on.

    ReplyDelete
    Replies
    1. I think SPRITE does a good job searching the possible distributions and feel pretty confident that there is no such distribution.

      If you try to work out such a distribution yourself, I think you will see that it is impossible.

      To get the mean all the way up to 15.76, you need a high mean in both the 10-14 participants and the 15-20 participants. Let's give every participant in each group the same age to reduce the variance within groups and keep this example simple. Every participant in the 10-14 group is exactly age 14. To achieve a mean of 15.76, then, participants in the 15-20 group must have a mean age of 17.53. If the two groups have mean ages of 14 and 17.53, then the SD of age in the total sample is already 1.77 -- well in excess of the reported SD of 1.18.

      Having any 10, 11, 12, 13 year-olds will require having more 18, 19, 20 year olds, further increasing the SD of age. I don't think it's possible, but I'd be interested to see such a distribution.

      Delete
    2. Ah yes, correct. I underestimated the effect of the standard deviation.

      I did not mean that when I typed my initial comment, but if one were extremely gracious, perhaps the mean and standard deviation are using continuous ages (i.e., a lot of people are around age 14.99, which count toward age group 14 but get counted as 14.99 for the mean and standard deviation). But overall, I agree that the numbers are very very likely just wrong.

      Delete
  37. The numbers should not be adjusted or altered. If not, the whole research study should be repeated and challenge by a different organization.

    ReplyDelete
  38. This comment has been removed by the author.

    ReplyDelete