Areas of Focus

Advances in biomedical technologies — including next-generation sequencing, single-cell genomics, and medical imaging — are resulting in an explosion of data. That’s giving researchers the opportunity to answer some of the most fundamental questions about the “programs” of life. Yet, self-driving cars, advertising, and recommender systems are the key drivers of advances in machine learning today.

Converging biology and machine learning

These systems require machine learning tools that optimize prediction accuracy. This is insufficient for biological problems where we seek to understand the genetic circuits of cells and the root causes of disease. To address these gaps, we are pursuing the development of machine-learning methods that will ultimately go beyond prediction and uncover causal mechanisms.

Cells & Optimal Perturbation Design

With the development of genetic technologies to precisely alter, or "perturb," cells comes the opportunity to understand cell-state transitions, which are fundamental to any biological process. But the huge number of ways we can perturb cells makes it challenging to test out these alterations in the lab. That's why the Eric and Wendy Schmidt Center is developing novel active learning frameworks that can hone in on which perturbations can bring about desired cell state transitions — and provide other insights into how cells work.
+ Hide Cells & Optimal Design of Interventions projects

Tissues & Causal Representation Learning

Through recent technological developments, we can now obtain RNA sequencing data from whole tissue sections without losing cell location information. But the computational methods for analyzing these spatial datasets are still inadequate. In particular, we need methods that can seamlessly integrate images, sequencing data, and 3D coordinates from large-scale spatial transcriptomic studies. To that end, the Eric and Wendy Schmidt Center is developing causal representation learning methods that allow us to integrate different kinds of data to uncover the mechanisms of how tissues are organized in health and disease.
+ Hide Tissues & Causal Representation Learning projects

Organisms & Multimodal Representation Learning

With the rise of biobanks around the world, we are entering an era where there will be millions of individuals with whole genome sequences, detailed health histories, and high-resolution imaging phenotypes. With this comes the opportunity to develop exquisite characterizations of diseases and more accurate predictions of who will respond best to therapies. Motivated by the goal of making precision medicine a reality, we are developing multimodal representation learning methods to better utilize rich, multimodal clinical datasets from partner health care systems.
+ Hide Organisms & Multimodal Representation Learning projects
Leadership
Our co-directors have spent their careers bridging the gap between data science and biology.
Learn more
Publications
Learn more about our fellows’ research on new computational methods for drug discovery, precision medicine, and more.
Learn more