Friday, September 16, 2016

A curmudgeonly read of the ZIKV case control study, Brazil 2016 (Lancet ID)

A ‘definitive’ study examining the relationship between ZIKV and microcephaly in Brazil has just been published [1].

I need preface this by saying this obviously represents a massive amount of work by a large number of people under very difficult circumstances, with many families making massive sacrifices to be involved, and I am in no way denigrating those efforts. It is also very explicit about being a preliminary analysis, and is being touted as a definitive causal statement.


But, why do I say that? Bias, bias, and more bias. 

I. Sample size, power and preliminary analyses

The authors state: “The original study aimed to include 200 cases and 400 controls to have 90% power, 95% precision to detect an association with an odds ratio of 2 or greater, assuming that 67% of cases were exposed.”

Power calculations exist for a very good reason- small numbers lie to you. Fantastic discussion [2], and the pdf here:

“However, as small studies are particularly susceptible to inflated effect size estimates and publication bias, it is difficult to be confident in the evidence for a large effect if small studies are the sole source of that evidence.”

This is why protocols get approved and pre-filed. Interim analyses are dangerous, as small numbers are unstable- I can confidently predict the final OR will be much, much closer to 1.

II. Biologically implausible effect sizes

The overall odds ratios (whether 55.5 or 86.5) are simply entirely biologically implausible. The only comparable OR I've ever seen, and the most-iron clad relationship in epi is mesothelioma and occupational long-term asbestos exposure, with an OR= 50.0 (25.8–96.8) [3]. If you have long-term exposure, you'll get mesothelioma, and essentially no-one else gets mesothelioma.

Looking at the another very strong relationship that everyone is familiar with, we have lung cancer and smoking, with a RR = 8.96; (95% CI: 6.73–12.11) (RR, since it's pooled in a meta-analysis) [4].

III. Biases in analysis

The authors analyze using “median unbiased estimator for binary data in an unconditional logistic regression model” which is also called ‘exact logistic’ to reduce instability due to small (or zero) cell counts. Excellent discussion here:

However, this exceedingly wide CI, with an upper bound of +∞ suggests a major problem and a potentially biased estimate, which requires closer examination. There are newer alternatives, particularity so-called Firth logistic regression. 

Rerunning the published numbers (ignoring matching and covariates) using Stata’s –firthlogit- gives an OR of 86.5 (95% CI: 4.9 to 1523.4). While still disconcertingly wide, this CI is acceptable for such sparse data.

IV. Loss of controls

The overall OR of 55.5 (8.6 to +∞) [or Firth: 86.5 (4.9 to 1523.4)] is based on 62 controls. However, the authors report moderate levels of refusal (76% agreed, so 20 refused). So what happens if some of those twenty controls that declined to participate were actually ZIKV (+)?

N of 94:                                            OR= 86.5 (95% CI: 4.9 to 1523.4;   p= 0.002)
N of 114 (5 ZIKV (+) controls):        OR= 9.8  (95% CI: 3.2 to 29.6;        p< 0.001)
N of 114 (10 ZIKV (+) controls):      OR= 4.8  (95% CI: 1.9 to 12.3;        p= 0.001)

While still all significant, the estimates very rapidly progress from jaw-dropping through interesting to ‘ho-hum[5], and statistically significant does not always mean biologically important.

Is there reason to think those that refused might be different from those that participated? Yes, I think so- perhaps they lived in outlying neighborhoods, or have different SES or other characteristics that might have a direct impact on likelihood of being ZIKV(+).

Other issues.

1. High levels of arboviral coinfection were not included in analysis- this can, and should have been considered in the regression models, both as interactions and as covariates. These data are rich enough to support a more comprehensive analysis.

2. No controls, and 19 (59%) of cases were ZIKV(-)- this is truly bizarre. I suspect what’s going on here is that ZIKV is not playing nice in serological tests [6]. Specifically, optical density (titer) responses for anything are a continuum, which requires a cut-off to determine sero-positivity, (generally 3SDs above a pool of sero-naïves).

If this cutoff is ‘wrong’ for ZIKV antobodies then there could be massive bias in classifying exposure, so the exposures captured might represent only the very highest levels of viremia where the risk could, indeed be very high. Moreover, the high levels of co-infections suggest something is interfering with the serology in an important way.

While not directly applicable to arboviruses, one example (Helicobacter pylori) found large differences in ORs when using a generic ELISA vs. one tuned for populations-at-risk [7].

Update 1: I should be clear here, I am not questioning that ZIKV is associated with MC as it clearly is in NE Brazil, but I am not yet convinced it is the sole risk factor, and the magnitude of that association is entirely unsettled.


1.         de Araújo TVB, Rodrigues LC, de Alencar Ximenes RA, de Barros Miranda-Filho D, Montarroyos UR, de Melo APL, et al. Association between Zika virus infection and microcephaly in Brazil, January to May, 2016: preliminary report of a case-control study. Lancet Infect Dis. doi:10.1016/S1473-3099(16)30318-8
2.         Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14: 365–376. doi:10.1038/nrn3475
3.         Rake C, Gilham C, Hatch J, Darnton A, Hodgson J, Peto J. Occupational, domestic and environmental mesothelioma risks in the British population: a case–control study. Br J Cancer. 2009;100: 1175–1183. doi:10.1038/sj.bjc.6604879
4.         Gandini S, Botteri E, Iodice S, Boniol M, Lowenfels AB, Maisonneuve P, et al. Tobacco smoking and cancer: A meta-analysis. Int J Cancer. 2008;122: 155–164. doi:10.1002/ijc.23033
5.         Nakagawa S, Cuthill IC. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev. 2007;82: 591–605. doi:10.1111/j.1469-185X.2007.00027.x
6.         De Smet B, Van den Bossche D, van de Werve C, Mairesse J, Schmidt-Chanasit J, Michiels J, et al. Confirmed Zika virus infection in a Belgian traveler returning from Guatemala, and the diagnostic challenges of imported cases into Europe. J Clin Virol. 2016;80: 8–11. doi:10.1016/j.jcv.2016.04.009
7.         Yuan J-M, Yu MC, Xu W-W, Cockburn M, Gao Y-T, Ross RK. Helicobacter pylori infection and risk of gastric cancer in Shanghai, China: updated results based upon a locally developed and validated assay and further follow-up of the cohort. Cancer Epidemiol Biomarkers Prev. 1999;8: 621–624.