SPAmix: a scalable, accurate, and universal analysis framework for large-scale genetic association studies in admixed populations.
| Authors | |
| Abstract | BACKGROUND: Inclusion of individuals with diverse or admixed genetic ancestries is crucial to discover novel findings that may be missed by genomics analyses rooted solely in European population.RESULTS: Here, we present an analysis framework, SPAmix, which is scalable to a large-scale biobank data analysis including hundreds of thousands of admixed individuals and is universally applicable to various types of complex traits including quantitative traits, time-to-event traits, ordinal traits, and longitudinal traits. Since no alternative model is fitted, SPAmix primarily focuses on association p values. For each genetic variant, SPAmix uses genotype data and genetic principal components to estimate individual-specific allele frequency, which is subsequently used to calibrate p values via a retrospective analysis. A hybrid strategy including saddlepoint approximation (SPA) can greatly increase the accuracy to analyze rare genetic variants, especially if the phenotypic distribution is unbalanced or extremely unbalanced. We also propose SPAmix to incorporate local ancestry to calculate ancestry-specific p values. To maximize the statistical powers, SPAmix is proposed to combine the p values of SPAmix and SPAmix via Cauchy combination.CONCLUSIONS: The SPAmix-based approaches are more accurate than Tractor to address phenotypic variance heterogeneity among ancestries when analyzing quantitative traits and to address an unbalanced case-control ratio when analyzing binary traits. SPAmix is an optimal unified approach for various cross-ancestry genetic architectures. Extensive simulation studies and real data analyses of 369,314 UK Biobank individuals from multiple ancestries demonstrated that SPAmix is scalable and can discover novel hits while controlling type I error rates well. |
| Year of Publication | 2025
|
| Journal | Genome biology
|
| Volume | 26
|
| Issue | 1
|
| Pages | 356
|
| Date Published | 10/2025
|
| ISSN | 1474-760X
|
| DOI | 10.1186/s13059-025-03827-9
|
| PubMed ID | 41102819
|
| Links |