Variant selection to maximize variance explained in cis-Mendelian randomization.

HGG advances
Authors
Keywords
Abstract

Optimal selection of instrumental variables (IVs) from a single gene region in cis-Mendelian randomization is challenging as variants are highly correlated due to linkage disequilibrium (LD). Using only the lead variant is convenient but may not achieve full statistical power if multiple signals exist. We compared four selection methods that incorporate correlated non-lead variants, including LD-pruning, conditional and joint analysis (COJO), Sum of Single Effects (SuSiE) regression, and principal component analysis (PCA), and evaluated their ability to increase instrument strength, measured by variance explained in the exposure (R), relative to the lead-variant-only approach. We applied these methods to circulating haptoglobin (HP), to simulated traits with known variance explained, and to 15 additional gene regions where non-lead cis-pQTLs contributed varying proportions of cis-genetic variance. R was estimated from variant-protein association estimates (Fenland study, N=10,708) using LD from the UK Biobank (N=356,557). In the HP region, the four methods produced a median proportional gain in R of 145.1% compared with the lead variant alone (range: 69.6% - 169.4%), with a median reduction in the MR standard error of 36.3% (range: -37.9% to -19.3%). In simulations, all methods were able to recover the expected genetic variance. Across the 15 gene regions, methods incorporating non-lead variants consistently outperformed the lead-variant-only approach. Variant selection methods incorporating correlated non-lead variants can reliably improve instrument strength in cis-MR analyses. We recommend using such methods but advise comparing their estimates with the lead-variant-only estimate to safeguard against numerical instability.

Year of Publication
2026
Journal
HGG advances
Pages
100573
Date Published
01/2026
ISSN
2666-2477
DOI
10.1016/j.xhgg.2026.100573
PubMed ID
41548043
Links