Building a two-way street between cell biology and machine learning

2023 MIT Jameel Clinic principal investigator, Caroline Uhler, wrote a comment in Nature magazine on why and how we should build a two-way street between machine learning and cell biology. She calls for new machine learning models that can better integrate different types of biological data and can uncover causal mechanisms in disease, not just associations.


With biomedical sciences quickly outgrowing many other application areas in terms of data generation, there is a unique opportunity for life sciences to become one of the greatest beneficiaries of research in machine learning and AI, and also inspire foundational developments in it.

In 2022, the genomics platform at the Broad Institute of MIT and Harvard generated about 80 petabytes of data — a similar scale of data generation as Twitter during the same period. And that is just one example of the explosion of biological data going on around the world. For me, a statistician and computer scientist at the Broad Institute, rapidly growing datasets in fields such as cell biology have been the inspiration for my research in machine learning and AI over the past decade.

Technologies such as single-cell RNA sequencing make it possible to profile, in a single experiment, the expression of the 20,000 human genes across one-million single cells1. Similarly, advanced imaging technologies, including spatial transcriptomics, are enabling detailed studies of tissues at the single-cell level, combining both gene expression and rich morphological features2. These data present incredible opportunities to understand not just the units of life, but also the programs of life: how do genes interact to give rise to a particular cell type? How do different types of cells organize to give rise to intricate tissue architectures? And how can we design interventions to control any cell-state transition precisely, such as from diseased to healthy or from one cell type to another?