Algorithms for reconstructing tumor evolution / Intro to Dirichlet Processes
University of Toronto Algorithms for reconstructing tumor evolution
Abstract: Tumors contain genetically heterogeneous cancerous subpopulations that can differ in their metastatic potential and response to treatment. Our work over the past few years has focused on using computational and statistical methods to reconstruct the phylogeny and the full genotypes of these subpopulations using data from high-throughput sequencing of tumor samples.
Tumor subpopulations can be partially characterised by identifying tumor-associated somatic variants using short read sequencing. Subsequent inference of copy number variants or clustering of the variant allele frequencies (VAFs) can reveal the number of major subpopulations present in the tumor as well as the set of mutations which first appear in each subpopulation. Further analysis, and often different data, is needed to determine how the subpopulations relate to one another and whether they share any mutations. Ideally, this analysis would reconstruct the full genotypes of each subpopulation.
I will describe my lab’s efforts to recover these full genotypes by reconstructing the tumor’s evolutionary history. We do this by fitting subpopulation phylogenies to the VAFs. In some circumstances, a full reconstruction is possible but often multiple phylogenies are consistent with the data. We have developed a number of methods (PhyloSub, PhyloWGS, treeCRP, PhyloSpan) that use Bayesian inference in non-parametric models to distinguish ambiguous and unambiguous portions of the phylogeny thereby explicitly representing reconstruction uncertainty. Our methods consider both single nucleotide variants as well as copy number variations and adapt to data on pairs of mutations.
David Benjamin
Data Sciences & Data Engineering Primer: Intro to Dirichlet Processes
Abstract: At a mundane level, Dirichlet processes are a clustering algorithm that determines the number of clusters. However, they are also a way to do Bayesian inference on a single infinite model rather than ad hoc model selection on a series of finite models and are the gateway to the field of Bayesian non-parametric models. Many introductions to Dirichlet processes take a formal measure-theoretic approach. In contrast, if you can understand the multinomial distribution you will understand this primer.