Multitask learning approaches to biological network inference: linking model estimation across diverse related datasets
Bonneau Lab, New York University
Due to increasing availability of biological data, methods to properly integrate data generated across the globe become essential for extracting reproducible insights into relevant research questions. We developed a framework to reconstruct gene regulatory networks from expression datasets generated in separate studies — and thus, because of technical variation (different dates, handlers, laboratories, protocols etc…), challenging to integrate. In this talk, I will introduce how we currently learn regulatory networks from gene expression data, and then, how we extend our methods to learn multiple networks from related datasets jointly through multitask learning. In particular, our method aims to be able to detect weaker patterns that are conserved across datasets, while also being able to detect dataset-unique interactions. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. Since underlying regulatory mechanisms are often shared across conditions and/or cohorts, we hypothesized that multitask approaches, where conclusions are drawn from various data sources, would improve performance of network inference. Using two unicellular model organisms, we show that joint network inference outperforms inference from a single dataset. Finally, we also demonstrate that our method is robust to false edges in the prior and to low condition overlap across datasets. Because of the increasing practice of data sharing in Biology, we speculate that cross-study inference methods will be largely valuable in the near future, increasing our ability to learn more robust and generalizable hypotheses and concepts.