# Summary data MR

Summary data are summarized genetic associations with the exposure and outcome (usually in the form of β-coefficients and SEs) often provided by consortia when sharing individual-level data are impractical. A common approach for deriving causal estimates from summary data with a single SNP is the Wald ratio, in which the coefficient of the SNP-outcome association is divided by the coefficient of the SNP-exposure association. If the outcome is a binary disease trait, the Wald ratio can be interpreted as the log OR for disease per unit increase in the exposure due to the SNP. This gives the same estimate as the 2SLS method with a single SNP.

A common approach to combining summary data from multiple SNPs is to use weighted linear regression, in which the coefficient of the gene-outcome association is regressed on the coefficient of the gene-exposure association, with weights derived from the inverse variance of the gene-outcome association, and with the intercept constrained to zero. The slope from this model can be interpreted as the MR estimate of the effect of the exposure on the outcome. This is equivalent to performing a fixed effect meta-analysis of the individual Wald ratio estimates and is often referred to as the IVW estimate. The rationale for the IVW estimate is given in Figure 3.

Figure 3. Scatter plot of SNP-outcome associations versus SNP-exposure associations for a fictional MR analysis using 13 variants. The IVW estimate is the slope obtained from a weighted linear regression of the SNP-outcome associations on the SNP-exposure associations, with the intercept constrained to zero.

Methods based on summary data generally require that the SNPs be completely independent or that the correlation between SNPs be taken into account—for example, through a variance-covariance matrix of the SNPs based on 1000-genomes data (58).