Primer: Capturing structure in high-dimensional data using K nearest neighbor graphs
Harvard Medical School
K nearest neighbor graphs (KNNs) are ubiquitous in high-dimensional data analysis and play a key role in diverse techniques including classification, clustering, non-linear dimensionality reduction, data integration, and trajectory inference. But what makes KNNs so powerful and broadly useful? In this primer, I will provide a general overview on KNNs, emphasizing how they help overcome the curse of dimensionality to efficiently represent meaningful structure in high-dimensional data. I will use the MNIST database of handwritten digits as an easy to visualize example to illustrate the mechanics of constructing and diffusing signal along the KNN. This will help set the stage for the main talk on Co-varying Neighborhood Analysis which employs KNNs to identify axes of sample-level variability, with fine granularity, in single-cell genomics datasets.