PMCID
PMC13060403

Deep learning-based stratification of Schizophrenia Spectrum Disorder from real-world data reveals distinct profiles of common and rare variant genetic signal.

medRxiv : the preprint server for health sciences
Authors
Abstract

Schizophrenia spectrum disorder (SSD) is a clinically and genetically heterogeneous condition, yet few studies have integrated real-world clinical data with both common and rare genetic variation to explore this complexity. In this study, we analyzed real-world data from 22,092 individuals in the Danish iPSYCH cohort (11,046 SSD cases and 11,046 matched population controls) leveraging nationwide registry data on diagnoses, hospitalizations, and parental history. Using a variational autoencoder (VAE), we compressed these features into a latent space and identified ten clinically distinct SSD subgroups that varied in comorbidity, parental diagnoses, hospital burden, and early-life adversity. Polygenic scores (PGSs) for five psychiatric disorders showed subgroup-specific enrichment, highlighting potential links between complex clinical profiles and common variant liability. In a subset with exome data (N=5,969), we assessed rare deleterious variant burden across SCZ-informed gene sets and Protein-Protein Interaction (PPI) networks, observing suggestive network-specific trends. This framework for integrating real world-based stratification with genetic evidence is scalable and transferable across cohorts, offering a path toward biologically informed patient classification.

Year of Publication
2026
Journal
medRxiv : the preprint server for health sciences
Date Published
04/2026
DOI
10.64898/2026.03.30.26349393
PubMed ID
41959828
Links