Primer: Contrastive PCA
Depts. of Biomedical Data Science, Computer Science, and Electrical Engineering, Stanford University
Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely-used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected in different conditions, e.g. a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. We propose a new method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in applications where PCA is currently used.