ӳ��ý

Evaluating Integrative Strategies for Incorporating Phenotypic Features in Spatial Transcriptomics.

ArXiv

Authors	Levin Moser Ahmad Hamid Esteban Miglietta Nodar Gogoberidze Beth Cimini
Keywords	Spatial transcriptomics cell type deconvolution morphological features multi-modal integration variational autoencoder
Abstract	The key advantage of spatial transcriptomics (ST) technologies lies in the spatial domain: these techniques not only offer an unprecedented opportunity to interrogate intact biological samples in a spatially informed manner, but also set the stage for integration with other imaging-based modalities. However, how to most effectively exploit spatial context and integrate ST with imaging-based modalities that capture morphological insight remains an open and heavily investigated question. To address this, particularly under real-world experimental constraints such as limited dataset size, class imbalance, and bounding-box-based segmentation, we used a publicly available murine ileum Multiplexed Error-Robust Fluorescence In Situ Hybridization (MERFISH) dataset to evaluate whether a minimally tuned variational autoencoder (VAE) could extract informative low-dimensional representations from cell crops of spot counts, nuclear stain, membrane stain, or a combination thereof. We assessed the resulting embeddings through PERMANOVA, cross-validated classification, and unsupervised Leiden clustering, and compared them to classical image-based feature vectors extracted via CellProfiler. While transcript counts (TC) generally outperformed other feature spaces, the VAE-derived latent spaces (LSs) captured meaningful biological variation and enabled improved label recovery for specific cell types. LS2, in particular, trained solely on morphological input, also exhibited moderate predictive power for a handful of genes in a ridge regression model. Notably, combining TC with LSs through multiplex clustering led to consistent gains in cluster homogeneity, a trend that also held when augmenting only subsets of TC with the stain-derived LS2. In contrast, CellProfiler-derived features failed to match the performance of the LSs, highlighting the advantage of learned representations over hand-crafted features. Collectively, these findings demonstrate that even under constrained conditions, VAEs can extract biologically meaningful signals from imaging data and constitute a promising strategy for multi-modal integration.
Year of Publication	2025
Journal	ArXiv
Date Published	07/2025
ISSN	2331-8422
PubMed ID	40766882
Links

Recent ӳ��ý Publications

The NeuroBioBank whole-genome catalogue of human brain donors with central nervous system disorders.

A resource of "bottom-line" variant associations for 1,281 complex traits by integrating data across published genome-wide association studies.

Prognostic value of tumor-informed ctDNA in HPV-independent head and neck squamous cell carcinoma.

Multi-molecular scores map process-specific polygenic diabetes risk to atherosclerosis, cardiometabolic diseases, and vascular complications.

Machine learning enables efficient and effective affinity maturation of nanobodies.