MarkerMatch: A Proximity-Based Probe-Matching Algorithm for Joint Analysis of Copy-Number Variants from Different Genotyping Arrays.

bioRxiv : the preprint server for biology
Authors
Abstract

MOTIVATION: Copy-number variants (CNVs) are a form of genetic structural variation with increasing importance in complex human disorders. Both DNA sequencing and microarray data can be used to call CNVs, which can be used in association tests, such as association between CNV number and disease status. Unlike genotypes, CNV detection in microarrays requires the use of observed intensity signals at each probe, which limits the imputability for analyses that span multiple array types. Thus far, a consensus set of probes (the intersection encompassing the probes that occur in common on all arrays) has been used to circumvent the problem of differing array-specific sensitivities. This has, however, led to excessive reduction in overall sensitivity of CNV calls as arrays can have an undesirably low overlap of probe sets. To overcome this limitation, we developed MarkerMatch, a proximity-based algorithm that matches probes across different genotyping microarrays to maximize the number of probes considered in the CNV calling algorithm, thereby increasing the resolution and sensitivity while preserving precision.RESULTS: By analyzing CNV calls from 4,906 individuals genotyped across three different arrays (Global Screening Array, Omni2.5 array, and Omni Express Exome array), we show that the MarkerMatch approach improves sensitivity by increasing the density of probes available for CNV calling while maintaining precision or improving it relative to the current practice (e.g., use of consensus probes only). We further demonstrate that MarkerMatch exceeds the output from current practice in terms of F1 score, Fowlkes-Mallows index, and Jaccard index. We also optimize MarkerMatch parameters, and , and find an optimal setting at 10kb, with no clear optimal candidate based on , indicating that parameters for this metric should be determined on a use case basis.

Year of Publication
2025
Journal
bioRxiv : the preprint server for biology
Date Published
07/2025
ISSN
2692-8205
DOI
10.1101/2025.06.30.662249
PubMed ID
40631151
Links