DM I. Bayesian logistic regression and mixed models: Revenge of the Gibbs
Ó³»´«Ã½
Our aim is to give background and motivation for Scott's talk next week. Consider SNP association testing against a binary phenotype (disease vs. no disease). While linear regression enjoys very efficient inference, the simplest version is lacking due to:
- erroneous hard calls of variants (go with probabilities)
- multiple testing (go Bonferonni, FDR)
- confounding by ancestry, batch effects (go add PCs)
- cryptic relatedness (go full mixed model)
- binary phenotype (go logistic)
- overfitting (go Bayesian)
- nonlinear dependence of phenotype on covariates (go Gaussian process?)
- admixture (go topic model?)
- non-normal distribution of effect sizes (go GMM prior?)
- sparsity (go lasso?)
- epistatis (go neural net?)
- ascertainment bias (go do some research)
- high-dimensional phenotypes, both continuous and categorical (go do some modeling)
We will describe models addressing some of these points including Bayesian probit, logit, and mixed logit models, and time-permitting, some fancier models mixing continuous and discrete structure. Our emphasis will be on how exponential-family conjugacy makes inference easy via Gibbs sampling in certain cases, whereas its absence leads one toward despair (at least for six more days).