Causal direction

Having recently addressed the issue of causation in general, I would like to now talk a bit about the problem of causal direction. As I alluded to earlier, even when one is able to use properly controlled statistics to establish a clear causal relationship between two variables, it is still another matter to determine which one is causing the other. Ideally, a researcher wants whatever their proposed independent variable (IV) is to cause their proposed dependent variable (DV) - that is, if they are trying to argue that cigarette smoking causes cancer, then cigarette smoking is the IV and cancer is the DV.

And now you are thinking “Of course cigarette smoking causes cancer!”. It is pretty absurd to imagine a world where cancer induces the habit of smoking cigarettes and not the other way around. But the simplicity of this particular example, while useful for illustration, belies the complexity of the problem. The question to ask is *why* do we know that it is cigarette smoking that causes cancer, and not cancer that causes cigarette smoking? Can we determine this by simply considering smokers versus non-smokers and cancer sufferers versus non-cancer sufferers in a statistical sense, or are we forced to rely on other arguments if we wish to take a definitive stance?

As far as I know (though please correct me if I’m wrong), there is no established academic response to this line of questioning. Different scholars take different approaches in different situations. In our smoking example, one likely would cite non-statistical material as indicative of the harmful effects of smoking (case studies, clinical trials, etc.), and then use it in conjunction with statistical research to show the full scope of the issue. While such arguments are neither parsimonious nor elegant, they are very typical and generally compelling (assuming that the various parts are all valid themselves).

Unfortunately, some fields do not have the luxury of employing controlled studies or trials, and in these cases the answer to causal direction is far less clear. When one simply has to “make do” with the data they have, determining causal direction becomes a great deal more tricky. Consider this example:

You have a theory that war is partially caused by the level of nationalism and nationalistic rhetoric present in the initiating country. The story for such a theory is relatively plausible - nationalism brings about a feeling of elitism and superiority, which is conducive to domestic approval for aggressive military action. You decide to operationalize nationalism based on the use of nationalistic rhetoric in mainstream media. And so you collect your data, with a DV of initiating war and an IV of nationalism, plus some controls (major power status, GDP, etc.).

You run your regression, and huzzah! There is a clear and statistically significant correlation between nationalism and war. But wait: you potentially have what is known as “simultaneity bias”, because your regressors may be endogenous rather than exogenous. In other words, you are running the equation in terms of y=x, when in reality the value of x itself may be dependent on y. The two sides of the equation simultaneously cause one another, which means once the math is done your estimates are going to be inaccurate (obviously a bad thing if you’re trying to derive policy implications or advice from this).

What’s the remedy? In a sense, the remedy is to get better data. Specifically though, you want to get what is called an instrumental variable (also known as a “proxy”). This is a variable that you want to be good at predicting your x variable (the independent variable or regressor) but one that is independent of your y variable (e.g. it is not caused by it). It’s okay if this seems confusing because it is, so here’s an example:

Let’s go back to our attempt to predict war as a function of nationalism. We can’t just measure nationalism concurrently with war, as it is reasonable to suspect that it is influenced by it and thus unsuitable as a regressor. And so, we want to determine an instrumental variable that is good at predicting nationalism but not dependent on the occurrence of war. One possibility would be to isolate events known to encourage nationalism (independence day, Olympics/international sports competitions, maybe elections) and use them as indicators for how nationalistic the country is. Holidays and sports events are generally stable and thus shouldn’t be influenced by the possibility of war (I realize events may be cancelled by the indicator is still sound), and thus are possibly suitable proxies for nationalism. Thus if one can identify a statistically significant relationship between pro-nationalism events and war, then the nationalism-causes-war argument is at least somewhat validated.

Of course, not everybody is running regressions and dealing with simultaneity and endogeneity in the mathematical sense. Yet non-statistical arguments can also fall prey to the issue of causal direction, as is clearly explained by A Rulebook for Arguments (pg. 38, linked to in the reading suggestions page). After considering the problem of whether television causes a decline in morals or vice-versa, the book offers the brief but potent bit of insight that one must simply evaluate which direction has a more plausible causal story. And most notably, if both are plausible, then perhaps the causal direction simply runs in both directions - after all, reality is complicated.

In the social sciences we are, as usual, left with more puzzles than answers. It is precisely these nuances that require us to be very careful in our theorizing - academically, political science is very young (50 or 60 years old, compared to thousands for mathematics and other disciplines), and we are still making only “baby steps” in our explanations for society. Overall, the best way to deal with causal direction is to be very conservative and precise in ones assertions - don’t expect to provide a grand unified theory of society, but instead pick a specific puzzle, generate a plausible argument, and test it rigorously. Whether your initial assertions prove to be correct or incorrect, you will have made a contribution to the body of knowledge that is social science. Eventually, we may be ready to say bigger things, but for now the small puzzles are all we can handle.

One Response to “Causal direction”

  1. Matt
    May 16th, 2007 | 4:30 pm

    Hey Aaron, it’s Matt Ling. I’ve been trying to find a way to get in touch with you besides AIM (since you’re hardly ever on.)

    Drop me an email at matthew.ling@gmail.com, I wanted to get your advice about school