Scalable automated reanalysis of genomic data in research and clinical rare disease cohorts.

medRxiv : the preprint server for health sciences
Authors
Abstract

Reanalysis of genomic data in rare disease is highly effective in increasing diagnostic yields but remains limited by manual approaches. Automation and optimization for high specificity will be necessary to ensure scalability, adoption and sustainability of iterative reanalysis. We developed a publicly available automated tool, Talos, and validated its performance using data from 1,089 individuals with rare genetic disease. Trio-based analysis identified 86% of known in-scope diagnoses, returning one variant per case on average. Variant burden reduced to one variant per 200 cases on iterative monthly reanalysis cycles. Application to an unselected cohort of 4,735 undiagnosed individuals identified 248 diagnoses (5.2% yield): 73 (29%) due to new gene-disease relationships, 56 (23%) due to new variant-level evidence, and 119 (48%) due to improved filtering and analysis strategies. Our automated, iterative reanalysis model, applied to thousands of rare disease patients, demonstrates the feasibility of delivering frequent, systematic reanalysis at scale.

Year of Publication
2025
Journal
medRxiv : the preprint server for health sciences
Date Published
05/2025
DOI
10.1101/2025.05.19.25327921
PubMed ID
40661289
Links