The Molecular Signatures Database (MSigDB) hallmark gene set collection.

Cell Syst
Authors
Abstract

The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.

Year of Publication
2015
Journal
Cell Syst
Volume
1
Issue
6
Pages
417-425
Date Published
2015 Dec 23
ISSN
2405-4712
DOI
10.1016/j.cels.2015.12.004
PubMed ID
26771021
PubMed Central ID
PMC4707969
Links
Grant list
R01 CA121941 / CA / NCI NIH HHS / United States
R01 CA154480 / CA / NCI NIH HHS / United States
R01 GM074024 / GM / NIGMS NIH HHS / United States
U54 CA112962 / CA / NCI NIH HHS / United States