Deep learning-based stratification of Schizophrenia Spectrum Disorder from real-world data reveals distinct profiles of common and rare variant genetic signal.
| Authors | |
| Abstract | Schizophrenia spectrum disorder (SSD) is a clinically and genetically heterogeneous condition, yet few studies have integrated real-world clinical data with both common and rare genetic variation to explore this complexity. In this study, we analyzed real-world data from 22,092 individuals in the Danish iPSYCH cohort (11,046 SSD cases and 11,046 matched population controls) leveraging nationwide registry data on diagnoses, hospitalizations, and parental history. Using a variational autoencoder (VAE), we compressed these features into a latent space and identified ten clinically distinct SSD subgroups that varied in comorbidity, parental diagnoses, hospital burden, and early-life adversity. Polygenic scores (PGSs) for five psychiatric disorders showed subgroup-specific enrichment, highlighting potential links between complex clinical profiles and common variant liability. In a subset with exome data (N=5,969), we assessed rare deleterious variant burden across SCZ-informed gene sets and Protein-Protein Interaction (PPI) networks, observing suggestive network-specific trends. This framework for integrating real world-based stratification with genetic evidence is scalable and transferable across cohorts, offering a path toward biologically informed patient classification. |
| Year of Publication | 2026
|
| Journal | medRxiv : the preprint server for health sciences
|
| Date Published | 04/2026
|
| DOI | 10.64898/2026.03.30.26349393
|
| PubMed ID | 41959828
|
| Links |