Monday, March 28, 2016

Asking for advice re: causal inference in SEM

I'm repeatedly running into an issue in causal interpretation of SEM models. I'm not sure what to make of it, so I want to ask everybody what they think.

Suppose one knows A and B to be highly correlated in the world, but one doesn't know whether there is causality between them.

In an experiment, one stages an intervention. Manipulation X causes a difference in levels of A between the control and treatment groups.

Here's the tricky part. Suppose one analyses the data gleaned from this experiment using SEM. One makes an SEM with paths X -> A -> B. Each path is statistically significant. This is presented as a causal model indicating that manipulation X causes changes in A, which in turn cause changes in B. 

Paths X->A and A->B are significant, but X->B is not. Is a causal model warranted?

However, if one tests the linear models A = b1×X and B = b2×X, we find that b1 is statistically significant, but b2 is not. (Note that I am not referring to the indirect effect of X on B after controlling for A. Tather, the "raw" effect of X on B is not statistically significant.)

This causes my colleagues and I to wonder: Does the SEM support the argument that, by manipulation of X, one can inflict changes in A, causing downstream changes in B? Or does this inject new variance in A that is unrelated to B, but the SEM fits because of the preexisting large correlation between A and B?

Can you refer me to any literature on this issue? What are your thoughts?

Thanks for any help you can give, readers.


  1. Some top-of-the-head thoughts...

    Normally in the full mediation model you would retain the residual or "direct" X -> B path as well. Is that included in your model but just not shown in the path diagram? If it's not included in the model, does the pattern you describe persist when the direct path is added?

    I think the preferred way of testing the indirect effect is not by conducting the joint significance test of X -> A and A -> B, but rather by testing the product of those two regression coefficients. It's possible that doing it this way would be less likely to lead to surprising results, but I couldn't say with any confidence whether that should be true in general. Also worth mentioning that it's my understanding that while the joint significance test is not the preferred method, in most cases it seems to work okay (as suggested by some simulation studies -- I think Dominique Muller has written some about this).

    In cases where the total effect of X -> B (your coefficient b2) is not significant but the indirect effect is significant, this is often attributed to a suppressive relationship among the variables. That is, the coefficient for the direct X -> B effect (not included in your diagram) and the product of the X -> A and A -> B coefficients (estimating the indirect effect) have opposite signs. You could check whether that seems to be true in your data.

    As for the causal question. With observational data like this, I think finding a significant indirect effect in an SEM like this would be "consistent with" the mediation hypothesis -- that is, it's more likely to happen if the mediation hypothesis is true than if it is false -- but the apparent indirect effect could also be due to (a) reverse or reciprocal causation between A and B, and/or (b) the errors perturbing A and B being correlated, which would be a consequence of omitted variable bias.

  2. Uh that's weird. At least as you drew it, that is the complete mediation model (aka X is an instrumental variable) and that can't happen in that model.