ӳ��ý

Simplifying causal gene identification in GWAS loci.

PLoS genetics

Authors	Marijn Schipper Jacob Ulirsch Danielle Posthuma Stephan Ripke Karl Heilbron
Abstract	Genome-wide association studies (GWAS) help to identify disease-linked genetic variants, but pinpointing the most likely causal genes in GWAS loci remains challenging. Existing GWAS gene prioritization tools are powerful but often use complex black box models trained on datasets containing biases. Here, we used a data-driven approach to construct a truth set of causal genes in 200 GWAS loci. We found that a simple logistic regression model performed as well as a more complex XGBoost model, and that many commonly-used gene prioritization features could be removed without meaningfully affecting performance (e.g., expression quantitative trait locus colocalization and Mendelian randomization). We present CALDERA, a gene prioritization tool that uses a logistic regression model and uses just four input features. In independent benchmarking datasets of resolved GWAS loci, CALDERA achieved state-of-the-art performance in comparison with other methods (FLAMES, L2G, and cS2G). CALDERA outputs causal gene probabilities for all genes in a given GWAS locus and we show that these probabilities are well-calibrated. Applying CALDERA to 93 UK Biobank traits, we predicted 11,956 putative causal genes, potentially resolving up to 52% of loci. Overall, CALDERA provides a powerful solution for prioritizing potentially causal genes in GWAS loci that minimizes the data processing required to construct input features and generates an easily-interpretable output score.
Year of Publication	2026
Journal	PLoS genetics
Volume	22
Issue	3
Pages	e1012079
Date Published	03/2026
ISSN	1553-7404
DOI	10.1371/journal.pgen.1012079
PubMed ID	41843578
Links

Recent ӳ��ý Publications

The role of frailty and comorbidities in severe infections and the risk of dementia: a prospective, multicohort, observational study.

Global trends and geographic variations of hypertension in childhood and adolescence 1990-2021: a systematic analysis of the Global Burden of Disease Study 2021.

Scalable biological-cognitive profiling for Alzheimer's disease in the population.

Resolving parameter uncertainty in SIR models through population-level serological surveillance: A synthetic study.

Extended precision cut liver slice culture models liver regeneration and ductular reaction.