Detecting novel associations in large data sets.
| Authors | |
| Keywords | |
| Abstract | Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships. |
| Year of Publication | 2011
|
| Journal | Science
|
| Volume | 334
|
| Issue | 6062
|
| Pages | 1518-24
|
| Date Published | 2011 Dec 16
|
| ISSN | 1095-9203
|
| URL | |
| DOI | 10.1126/science.1205438
|
| PubMed ID | 22174245
|
| PubMed Central ID | PMC3325791
|
| Links | |
| Grant list | P50 GM068763-09 / GM / NIGMS NIH HHS / United States
P50 GM068763 / GM / NIGMS NIH HHS / United States
T32 GM007753 / GM / NIGMS NIH HHS / United States
U54 GM088558-03 / GM / NIGMS NIH HHS / United States
U54 GM088558 / GM / NIGMS NIH HHS / United States
090532 / Wellcome Trust / United Kingdom
U54GM088558 / GM / NIGMS NIH HHS / United States
|