DM I. Bayesian logistic regression and mixed models: Revenge of the Gibbs

ӳ��ý

Our aim is to give background and motivation for Scott's talk next week. Consider SNP association testing against a binary phenotype (disease vs. no disease). While linear regression enjoys very efficient inference, the simplest version is lacking due to:

erroneous hard calls of variants (go with probabilities)
multiple testing (go Bonferonni, FDR)
confounding by ancestry, batch effects (go add PCs)
cryptic relatedness (go full mixed model)
binary phenotype (go logistic)
overfitting (go Bayesian)
nonlinear dependence of phenotype on covariates (go Gaussian process?)
admixture (go topic model?)
non-normal distribution of effect sizes (go GMM prior?)
sparsity (go lasso?)
epistatis (go neural net?)
ascertainment bias (go do some research)
high-dimensional phenotypes, both continuous and categorical (go do some modeling)

We will describe models addressing some of these points including Bayesian probit, logit, and mixed logit models, and time-permitting, some fancier models mixing continuous and discrete structure. Our emphasis will be on how exponential-family conjugacy makes inference easy via Gibbs sampling in certain cases, whereas its absence leads one toward despair (at least for six more days).

ӳ��ý