Molecular grammars of predicted intrinsically disordered regions that span the human proteome.

Cell
Authors
Keywords
Abstract

Intrinsically disordered regions (IDRs) of proteins are defined by molecular grammars. This refers to IDR-specific non-random amino acid compositions and non-random patterning of distinct pairs of amino acid types. Here, we introduce grammars inferred using NARDINI+ (GIN) as a resource that uncovers IDR-specific and IDRome-spanning grammars. Using GIN-enabled analyses, we find that specific IDR features and GIN clusters are associated with distinct biological processes, intra-cellular localization preferences, specialized molecular functions, and functionalization as assessed by cellular fitness correlations. IDRs with exceptional grammars, defined as sequences with high-scoring non-random features, are harbored in proteins and complexes that enable spatial and temporal sorting of biochemical activities within the nucleus. Overall, GIN can be used to extract sequence-function relationships of individual IDRs or clusters of IDRs, to redesign extant IDRs or design de novo IDRs, to perform evolutionary analyses through the lens of molecular grammars and GIN clusters, and to make sense of IDR-specific disease-associated mutations.

Year of Publication
2025
Journal
Cell
Date Published
11/2025
ISSN
1097-4172
DOI
10.1016/j.cell.2025.10.019
PubMed ID
41232529
Links