insilicoSV: a flexible grammar-based framework for structural variant simulation and placement.

Bioinformatics (Oxford, England)
Authors
Abstract

SUMMARY: Structural variants (SVs) are key drivers of genetic variation and disease in the genome. Their discovery remains challenging, however, in large part due to the scarcity of validated SV callsets and comprehensive benchmarks, which are essential for method development and evaluation. The growing number of data-driven learning-based approaches for SV discovery, in particular, requires large, diverse, and well-balanced training datasets to achieve reliable performance. To address this need, SV simulation has served as a key tool for assessing method performance and training SV models. However, existing SV simulators only support a fixed and limited set of SV classes and do not provide fine-grained control over the placement of SVs within specific contexts of the genome. Here we present insilicoSV, a versatile framework for SV simulation, which models SVs using a simple and flexible grammar, allowing users to easily define standard and custom arbitrary genome rearrangements, as well as encode genome placement constraints. This design allows insilicoSV to naturally support new and bespoke SV types, such as the complex rearrangements of cancer genomes. In addition to grammar-based modeling, insilicoSV provides built-in support for 26 predefined SV types, placement of user-provided SVs, small variant simulation, streamlined workflows for the simulation of genome evolution and genome mixtures, read simulation, alignment, and visualization. These features enable the creation of comprehensive genomic datasets for a variety of downstream applications, such as in-depth benchmarking of alignment and variant calling methods, as well as training of data-driven learning-based approaches for SV detection.AVAILABILITY AND IMPLEMENTATION: insilicoSV is available under the MIT license at and .SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Year of Publication
2025
Journal
Bioinformatics (Oxford, England)
Date Published
10/2025
ISSN
1367-4811
DOI
10.1093/bioinformatics/btaf594
PubMed ID
41172269
Links