Comparison of variant callers using 60 532 multi-ancestry whole genome sequences.

Briefings in bioinformatics
Authors
Keywords
Abstract

Whole genome sequencing (WGS) studies play a pivotal role in studying the genetic underpinnings of human diseases and traits. High quality and reproducible variant calling is the cornerstone for the success of downstream analyses, including WGS association studies and polygenic risk prediction. This paper compares the data quality, performance, and concordance of two widely used WGS variant callers, the Genome Analysis Toolkit (GATK) and Variant Tool set that discovers short variants (VT), using 60 532 multi-ancestry whole genomes sequenced by the Centers for Common Disease Genomics (CCDGs) of the NHGRI Genome Sequencing Program. Our findings show that both QCed GATK and VT pipelines yield highly consistent and reliable called Single Nucleotide Variants (SNVs) in large-scale WGS studies, supporting their agreements in joint variants calling. However, the two pipelines exhibit greater discrepancies in calling insertions and deletions (INDELs).

Year of Publication
2026
Journal
Briefings in bioinformatics
Volume
27
Issue
2
Date Published
03/2026
ISSN
1477-4054
DOI
10.1093/bib/bbag130
PubMed ID
41894165
Links