Enhancing Type 1 Diabetes Polygenic Risk Prediction Through Neural Networks and Entropy-Derived Insights.

International journal of molecular sciences
Authors
Keywords
Abstract

Type 1 diabetes (T1D) is an autoimmune disease with a strong genetic component (~70% heritability). Early identification of individuals at risk is crucial for early intervention or risk assessment. Although polygenic risk scores (PRS) have shown promise in risk assessment, most current approaches remain constrained by linear assumptions and limited generalizability. We aimed to develop a neural network-driven classifier using T1D-associated single nucleotide polymorphisms (SNPs). In addition, we explored the inclusion of an entropy-derived feature as a complementary variable, representing the degree of genetic variability within an individual's genotype profile across the 67 T1D-associated SNPs, to evaluate its potential additive contribution to the model performance. We analyzed genotype data from 11,909 individuals in the UK BioBank (546 T1D cases and 11,363 controls). Sixty-seven well-known SNPs associated with T1D were utilized as inputs to the model, using two distinct allele-encoding strategies. A feed-forward neural network was evaluated under varying case-control ratios through five-fold cross-validation. Performance was assessed using the area under the receiver operating characteristic curve (AUC) on a held-out test set and on an external European cohort as a validation cohort. Across five-fold cross-validation, the best configuration achieved a median AUC of 0.903. On the held-out UK Biobank test set, the model generalized well, with an AUC of 0.8889 (95% CI: 0.8516-0.9262). A probability-based risk framework, constructed using five risk groups ("very low", "low", "intermediate", "high", and "very high" risk), yielded a negative predictive value (NPV) of 98.9% for the "very low" risk group and a Positive Predicted Value (PPV) of 61.9% with a specificity of 97.3% for the "very high" risk group, assuming a 10% T1D prevalence. External validation in the German Diabetes Study reproduced clear case-control separation; for individuals with recent onset diabetes and glutamic acid decarboxylase antibodies (GADA+) vs. controls, specificity reached 91.9% in the "high" risk group (PPV of 94.3%) and 97.6% in the "very high" risk group (PPV of 95.7%). The proposed neural network reliably predicts T1D genetic risk using a compact SNP panel of 67 SNPs and maintains accuracy in both internal and external European cohorts. Its probabilistic output enables clinically interpretable risk thresholds, while entropy features contributed modestly to performance. These results demonstrate that a neural network-based approach achieves discriminative performance that is comparable to established T1D genetic risk models, while offering flexible probability-based risk stratification and architectural extensibility for future integration of additional features.

Year of Publication
2026
Journal
International journal of molecular sciences
Volume
27
Issue
7
Date Published
03/2026
ISSN
1422-0067
DOI
10.3390/ijms27072966
PubMed ID
41977154
Links