Primer: Generalized linear models and latent factor models
Engelhardt Group, Princeton University
Generalized linear models (GLMs) are widely used in the statistical analysis of data with non-normally distributed errors. Examples include logistic regression for binary outcomes and negative binomial regression for overdispersed counts. In this primer, we review the fundamental components of the GLM as well as standard algorithms for optimizing the unknown parameters. GLMs are a form of supervised learning- they describe the effect of one or more predictor variables X on a single outcome variable Y. However, many modern datasets consist of large numbers of measurements that are all jointly of interest and unsupervised learning is more appropriate. Latent factor models such as principal component analysis are a popular approach to dimension reduction in this setting. We examine their basic properties from a probabilistic perspective. This lays the foundation for generalized bilinear models, which enable latent factor models to be fit to non-Gaussian data just as GLMs are in the supervised setting.