Bayesian Variable Selection & Talk: BVS applied to Bioinformatics
Martin Jankowiak
Ó³»´«Ã½ of MIT and Harvard Meeting: Applications of Bayesian Variable Selection to Bioinformatics
High-dimensional features like RNA expression levels or ATAC-seq peaks are ubiquitous in modern biological datasets. While these kinds of datasets are incredibly rich, extracting reliable insights from high-dimensional data presents a number of significant computational and statistical challenges. In this talk I give an overview of how one approach to high-dimensional statistics---Bayesian variable selection (BVS)---can be applied to problems in bioinformatics. In the first application I illustrate how BVS can be used to identify the genetic determinants of differential viral fitness from SARS-CoV-2 genomic surveillance data. In the second application I describe how BVS can be used to pinpoint regulatory elements in CRISPR tiling screen data. I also provide a short demo of millipede, an open source package for BVS.
Martin Jankowiak
Ó³»´«Ã½ of MIT and Harvard Primer: An Introduction to Bayesian Variable Selection
Generalized linear models are a mainstay of applied statistics and data analysis, due in large part to their interpretability. However, applying these tools in the high-dimensional setting with a large number of covariates or features brings a number of computational and statistical challenges. In this primer I give an introduction to Bayesian variable selection, which is a powerful approach for inferring generalized linear models in the high-dimensional setting that is formulated in terms of a Bayesian model selection problem. One benefit of this approach is that by construction it makes it possible to compute a Posterior Inclusion Probability or PIP, an interpretable feature-wise score that encodes the statistical evidence for the importance of each feature in explaining the response variable. I place a special emphasis on comparing Bayesian variable selection to alternative approaches to high-dimensional statistics, including Lasso and continuous shrinkage priors like the Horseshoe.