Inferring compound heterozygosity from large-scale exome sequencing data.

bioRxiv : the preprint server for biology
Authors
Abstract

Recessive diseases arise when both the maternal and the paternal copies of a gene are impacted by a damaging genetic variant in the affected individual. When a patient carries two different potentially causal variants in a gene for a given disorder, accurate diagnosis requires determining that these two variants occur on different copies of the chromosome (i.e., are in ) rather than on the same copy (i.e. in ). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. We developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in exome sequencing data from the Genome Aggregation Database (gnomAD v2, n=125,748). When applied to trio data where phase can be determined by transmission, our approach estimates phase with 95.7% accuracy and remains accurate even for very rare variants (allele frequency < 1×10). We also correctly phase 95.9% of variant pairs in a set of 293 patients with Mendelian conditions carrying presumed causal compound heterozygous variants. We provide a public resource of phasing estimates from gnomAD, including phasing estimates for coding variants across the genome and counts per gene of rare variants in , that can aid interpretation of rare co-occurring variants in the context of recessive disease.

Year of Publication
2023
Journal
bioRxiv : the preprint server for biology
Date Published
08/2023
DOI
10.1101/2023.03.19.533370
PubMed ID
36993580
Links