Should we adjust for gestational age when analysing birth weights? The use of z-scores revisited - Delbaere et al.

asharma's picture

1) Delbaere et al. 'Should we adjust for gestational age when analysing birth weights? The use of z-scores revisited. Human Reproduction 22:8 2080-83
2) Wilcox. The Perils of Birth Weight—A Lesson from Directed Acyclic Graphs, Am J Epidemiol;164:1121–112
3) Hernandez-Dıaz S, Schisterman EF, Hernan MA. The birth weight ‘‘paradox’’ uncovered? Am J Epidemiol;164: 1115–20.
4) Tu et al. Growth, current size and the role of the 'reversal paradox' in the foetal origins of adult disease: an illustration using vector geometry. Epidemiol Perspect Innov.3: 9.

Some papers ultimately achieve significance beyond their relatively small and specialized initial audiences. This is particularly the case when it comes to methodologic insights. It is for this reason that the study by Delbaere et al cited above merits greater recognition.

In 1951, Simpson was describing the general analysis of contingency tables when he introduced what has since come to be known eponymously as Simpsons Paradox. When discussed at all in introductory texts, it is usually categorized as an example of an 'extreme confounder', which it is. However, the paradox refers more specifically to the demonstration of one effect in aggregate data and the exact opposite effect in each stratum or subgroup that makes up that aggregate. It is less confusing perhaps with a 'toy' example comparing two treatments A and B:

Males Females
Treat A Treat B Treat A Treat B
Success 200 10 19 1000
Failure 1800 190 1 1000
2000 200 20 2000 (click for formatted version of tabular data)

The above contingency table illustrates the paradox. Treatment A was given to 2020 people, and 219 were cured, for a success rate of 10.8%. In comparison, Drug B cured 1010 of the 2200, or 45.9%, and we would therefore select treatment B as superior. However, if we analyze the sex-specific subgroups, we have different results. Among males, the success rate was 10% for treatment A (200/2000) and 5% for the alternative (10/200). Similarly, among females, the success rate was 95% (19/20) for treatment A and 50% (1000/2000) for treatment B. In both cases, treatment A is preferable. As illustrated here, Simpsons paradox arises when there is a 'effect modifier', which is able to reverse the sign of the effect. In this case, the modifier (gender) is in fact an intermediate variable on the causal pathway between exposure (treatment) and outcome (success).

Less well appreciated is the fact that the same phenomenon arises with attempts to adjust more complicated analyses for a factor like birth weight that is also affected by prior exposure, which can introduce a selection bias with an equally dramatic impact on the sign of the observed effect. This bias may occur regardless of the adjustment method e.g. regression, stratification or z-score. As these authors explain clearly, the key ingredient is the attempt to stratify by some intermediate variable which is affected by the exposure or shares a common cause with the outcome.
To this extent, their specific results may be of only limited interest (they conclude that it is inappropriate to adjust for gestational age when assessing the effect of single vs. double embryo transfers on birth weight in the setting of assisted reproductive technologies). What is more important is the general strategy they propose to help illuminate the potential dangers involved in such adjustments. In brief, they advocate using directed acyclic graphs (DAGs or causal diagrams) summarizing the relationships between exposures, outcomes, confounders, and intermediate variables. With directed graphs, arrows between exposures and outcomes imply causal relationships, thus providing an intuitive, visual tool for developing and testing alternative causal models. In this specific context, adjustments should be avoided for factors that lie on the causal pathway between exposure and pregnancy outcome unless those factors share no common cause with the outcome of interest. The philosophy is elaborated in greater detail in the commentary by Wilcox, whose concluding remarks are worth consideration:

“ It is the mantra of observational studies that we can never rule out unobserved confounding. Perhaps we need a second mantra: Never adjust for covariates just because they are handy. Epidemiologists cannot depend on adjustments (or stratifications of any sort) to bring results closer to the truth. Indeed… baseless adjustments are easily worse than no adjustment at all”

It should probably be noted that theirs is not the only graphical method for understanding this artifact. Tu et al applied a vector geometric approach to multiple regression analysis to examine the so-called 'foetal origins of adult disease hypothesis', or the inverse association between birth weight and a range of diseases in later life. Just as the study by Delbaere does not invalidate standardization by z-scores in general, this work should not be regarded as a broad attack on this hypothesis. Nevertheless, they quite rightly point out that some of these studies have only been able to demonstrate a statistically significant association by adjusting for current size, a strategy that is susceptible to what they call the 'reversal paradox' and which further reinforces the dangers of uncritical adjustment for potential covariates.

Atul Sharma MD, FRCP(C)

tableSimpson.png8.73 KB

Comments's picture

Simpson's Paradox

Can you cite an example of Simpson's Paradox in the real world?

asharma's picture


Perhaps I was insufficiently clear, but I was trying to make the point that Simpson’s Paradox also arises in the adjustment for covariates, and the original post therefore contains several relevant examples.

You might also be interested in reading the reply to Dr. Wilcox’s commentary (Hernandez-Diaz et al. Am J Epidemiol 164:1124-25). Here, the authors illustrate how adjustment for birth weight results in estimates of the effects of prenatal factors that are opposite in sign to the unadjusted estimates, which is just “Simpson’s Paradox” when the adjustment factor is a confounder. When the adjustment factor (like birthweight in their example) is also affected by the prior exposure, adjustment may introduce selection bias and the same 'reversal paradox'.

The reason for belaboring the point is to emphasize that there is no easy way to distinguish between ‘right and wrong’ i.e. Dr. Wilcox’s first mantra that “we can never rule out unobserved confounding”. In graduate school, I would get upset when ‘professional’ statisticians glibly berated subject matter specialists for their statistical naiveté, but this is a good example where solid subject matter expertise is the only way to identify plausible models for testing. Even though DAGs and other devices may help us localize potential confounders, we can ultimately only suspect the problem if we understand the biology and carefully consider the causal structure of the relationships being studied.

For pediatric examples in a more ‘classical’ setting of categorical outcomes and contingency table analysis, you might look at

Puliyel 2007. Socio-economic Disparities Probably Invalidate Bangladesh Hib Study Conclusions,


Osler et al, 2001 Do Pediatric Trauma Centers Have Better Survival Rates than Adult Trauma Centers? An Examination of the National Pediatric Trauma Registry.


Creative Commons License | Powered by Drupal | Latest updates via RSS