Proteogenomics-enabled discovery of novel small open reading frame (sORF)-encoded polypeptides in human and mouse tissues.

Nucleic acids research
Authors
Abstract

Small open reading frames (sORFs) encode an emerging class of functional proteins less than 100 amino acids in length. However, sORFs are incompletely characterized in mice and humans. The development of proteomics and Ribo-seq techniques has enabled the discovery of a number of sORF-encoded peptides (SEPs), but previous proteogenomics studies have been limited to a few cell lines or tissues. Given these limitations, a potentially vast number of sORFs remains to be discovered. We collected community-scale previously published proteomics data including one billion experimental spectra derived from a wide range of mouse and human tissues in order to identify novel sORFs and reveal the tissue expression status of novel and recently annotated sORF-encoded proteins. We have detected several novel sORFs in specific tissues, including a conserved protein-coding upstream overlapping ORF in HNRNPUL2 expressed in human lymphocytes, which may hold important biological functions. This work introduces a simple and efficient filtration strategy to detect novel sORFs. Our workflow will likely prove useful for future studies on sORFs in humans and other animals.

Year of Publication
2025
Journal
Nucleic acids research
Volume
53
Issue
14
Date Published
07/2025
ISSN
1362-4962
DOI
10.1093/nar/gkaf687
PubMed ID
40716779
Links