Broad Institute - Eric and Wendy Schmidt Center

Poster Abstracts

‍‍

First place: Ada Fang

34 – Learning Universal Representations of Intermolecular Interactions with ATOMICA (Ada Fang, Michael Desgagné, Zaixi Zhang, Andrew Zhou, Joseph Loscalzo, Bradley L. Pentelute, Marinka Zitnik)‍

Second place: Anush Chiappino-Pepe

24 – Biosynthesis of non-standard amino acids with AI (Anush Chiappino-Pepe*, Peter G Mikhael, Itamar Chinn, Bogdan Budnik, Alexander Pauer, Taylor Lanosky, Jenny M. Tam, Gregory Stephanopoulos, Regina Barzilay, George M. Church)‍

Third place: Josephine Yates

13 – SpatialFusion: A lightweight multimodal foundation model for pathway-informed spatial niche mapping (Josephine Yates, Mitra Shavakhi, Toni K. Choueiri, Rebecca Leary, Lisa Kattenhorn, Eliezer M. Van Allen, Caroline Uhler)‍

Third place: Mahshid Heidari and Mina Karimpour

42 – Evaluating Single-Cell Perturbation Response Models Is Far from Straightforward (Mahshid Heidari, Mina Karimpour, Sumana Srivatsa, Hesam Montazeri)‍

‍Best Short Talk: Maile Hirschmann and Dawn Chen‍

Generative Design of Cell Type-Specific RNA Splicing Elements for Programmable Gene Regulation

Please see a list of all presenters and their abstracts below.

1 -- Efficient Personalization of Generative Models via Optimal Experimental Design

Guy Schacht, Ziyad Sheebaelhamd, Riccardo De Santi, Mojmír Mutný, Andreas Krause

Preference learning from human feedback has the ability to align generative models with the needs of end-users. Human feedback is costly and time-consuming to obtain, which creates demand for data-efficient query selection methods. This work presents a novel approach that leverages optimal experimental design to ask humans the most informative preference queries, from which we can elucidate the latent reward function modeling user preferences efficiently. We formulate the problem of preference query selection as the one that maximizes the information about the underlying latent preference model. We show that this problem has a convex optimization formulation, and introduce a statistically and computationally efficient algorithm ED-PBRL that is supported by theoretical guarantees and can efficiently construct structured queries such as images or text. We empirically present the proposed framework by personalizing a text-to-image generative model to user-specific styles, showing that it requires less preference queries compared to random query selection.

2 -- Large Language Models Generate Stigmatizing Language During Reasoning Over Real-World Clinical Data

Yutong Yang, Bowen Gu, David B. Hathaway, Richard Wyss, Qingyu Chen, Kueiyu Joshua Lin, Li Zhou, Jie Yang

Stigmatizing language in clinical documentation can negatively impact patient care and health outcomes through multiple pathways, including reduced quality of care, increased medical errors, and poorer patient-provider relationships. Large language models (LLMs) are increasingly integrated into clinical workflows to process and generate electronic health records. However, whether LLMs generate stigmatizing content when reasoning over real-world clinical data has not been systematically evaluated.

We examined the reasoning text of 95 advanced LLMs on 35 real-world English clinical tasks, under chain-of-thought settings. The 35 tasks were sourced from the BRIDGE benchmark, a comprehensive collection of real-world clinical tasks. A clinician-validated natural language processing tool was deployed to automatically identify stigmatized language in LLM reasoning text. The tool achieved PPV and NPV of 95% and 100%, respectively.

Among all 3,325 model-task pairs, 2,804 pairs (84.33%) showed non-zero stigma rates, with an overall mean rate of 1.96%. General-purpose models showed higher stigma rates than clinically specialized models (2.03% versus 1.68%; p<0.01), and reasoning-enabled models exceeded non-reasoning systems (2.40% versus 1.72%; p<0.001). Tasks involving discharge summaries showed an average rate of 3.57%, exceeding radiology reports (0.04%) by over an order of magnitude. Stigma rates of input context and output reasoning text are significantly correlated (Pearson r=0.484; p<0.001). Reasoning text in incorrect predictions contained higher stigma rates than correct outputs (1.46% versus 1.40%; p=0.006).

LLMs systematically generate stigmatizing language when reasoning over clinical tasks, which may negatively impact patient trust and dignity. Reasoning-enabled models produced higher rates than non-reasoning models, suggesting that intermediate reasoning steps may expose implicit biases. These findings underscore the need for targeted term suppression, deployment safeguards, and systematic evaluation.

3 -- Graph AI generates neurological hypotheses validated in molecular, organoid, and clinical systems

Ayush Noori, Joaquín Polonuer, Katharina Meyer, Bogdan Budnik, Shad Morton, Xinyuan Wang, Sumaiya Nazeen, Yingnan He, Iñaki Arango, Lucas Vittor, Matthew Woodworth, Richard C. Krolewski, Michelle M. Li, Ninning Liu, Tushar Kamath, Evan Macosko, Dylan Ritter, Jalwa Afroz, Alexander B. H. Henderson, Lorenz Studer, Samuel G. Rodriques, Andrew White, Noa Dagan, David A. Clifton, George M. Church, Sudeshna Das, Jenny M. Tam, Vikram Khurana, Marinka Zitnik

Neurological diseases are the leading global cause of disability, yet most lack disease-modifying treatments. We present PROTON, a heterogeneous graph transformer that generates testable hypotheses across molecular, organoid, and clinical systems. To evaluate PROTON, we apply it to Parkinson's disease (PD), bipolar disorder (BD), and Alzheimer's disease (AD). In PD, PROTON linked genetic risk loci to genes essential for dopaminergic neuron survival and predicted pesticides toxic to patient-derived neurons, including the insecticide endosulfan, which ranked within the top 1.29% of predictions. In silico screens performed by PROTON reproduced six genome-wide alpha-synuclein experiments, including a split-ubiquitin yeast two-hybrid system (normalized enrichment score [NES] = 2.30, FDR-adjusted p < 1E-4), an ascorbate peroxidase proximity labeling assay (NES = 2.16, FDR < 1E-4), and a high-depth targeted exome sequencing study in 496 synucleinopathy patients (NES = 2.13, FDR < 1E-4). In BD, PROTON predicted calcitriol as a candidate drug that reversed proteomic alterations observed in cortical organoids derived from BD patients. In AD, we evaluated PROTON predictions in health records from n= 610,524 patients at Mass General Brigham, confirming that five PROTON-predicted drugs were associated with reduced seven-year dementia risk (minimum hazard ratio = 0.63, 95% CI: 0.53–0.75, p < 1E-7). PROTON generated neurological hypotheses that were evaluated across molecular, organoid, and clinical systems, defining a path for AI-driven discovery in neurological disease.

4 -- Improving atlas-scale single-cell annotation models with hierarchical cross-entropy loss

Sebastiano Cultrera di Montesano, Davide D'Ascenzo, Srivatsan Raghavan, Ava P. Amini, Peter S. Winter, Lorin Crawford

Accurately annotating cell types is essential for extracting biological insight from single-cell RNA sequencing data. Although cell types are naturally organized into hierarchical ontologies, most computational models do not explicitly incorporate this structure into their training objectives. Here, we introduce a hierarchical cross-entropy loss that aligns model objectives with biological structure. Applied to architectures ranging from linear models to transformers, this simple modification improves out-of-distribution performance by 12−15% without added computational cost. Critically, we underscore the need to focus on new data generation that improves the connectivity among annotated cell types. Our work suggests that this is likely to yield more generalizable algorithms than would solely increasing model complexity.

5 -- Interaction-aware generative modeling for bioisosteric drug design

Keir Adams*, Kento Abeywardane*, Jenna Fromer, Connor Coley

Engineering molecules to exhibit precise 3D intermolecular interactions with their environment forms the basis of chemical design. In small-molecule drug design, interactions that drive binding affinity and potency are primarily governed by a molecule’s shape, electrostatics, and pharmacophores. These molecular abstractions have enabled the design of bioisosteres—molecules that maintain similar 3D interaction profiles but have diverse 2D graph structures that potentially enhance other physicochemical properties. It is common in ligand-based drug design to virtually screen chemical libraries for bioisosteric analogues of known bioactive compounds with scoring functions that assess the similarity of these key features. However, these methods are fundamentally limited by inefficient, undirected searches and their inability to explore novel chemical space. 3D deep generative models offer an efficient approach for navigating these complex subspaces.

We hypothesize that a generative model that learns the joint distribution over 3D molecular structures and their interaction profiles may facilitate 3D interaction-aware chemical design. Specifically, we present ShEPhERD, an SE(3)-equivariant diffusion model that learns to jointly denoise 3D molecular graphs and explicit representations of their shapes, electrostatic potential surfaces, and (directional) pharmacophores from Gaussian noise. We demonstrate ShEPhERD’s potential for impact via exemplary drug design tasks including natural product ligand hopping, protein-blind hit diversification, and bioisosteric fragment merging. These diverse tasks are enabled by ShEPhERD’s ligand-based formulation, which allows training on any desired chemical space and potential applications to cases with only phenotypic feedback. Beyond hit expansion via scaffold hopping, the diffusion framework is amenable to tasks in lead optimization such as scaffold decoration. This flexibility positions ShEPhERD as a general framework for 3D interaction-aware molecular design, with applications beyond the scope of ligand-based drug discovery.

6 -- Automated Analysis of Polar Growth Dynamics in Fission Yeast: A Hybrid Deep Learning and Geometric Modeling Approach

Michael Widener, Carmen Rivera, Maitreyi Das

Cell shape is critical for biological function, yet quantifying the dynamics of polarized growth in rod-shaped organisms like Schizosaccharomyces pombe remains a high-throughput bottleneck. Manual annotation of subcellular features, such as the "birth scar", a subtle ridge marking previous division sites, is subjective and slow. Distinguishing the "Old End" from the "New End" is essential for correlating localized protein activity (e.g., Cdc42 and Rho1) with growth rates. I developed a computational pipeline that automates this analysis by integrating deep learning for segmentation with differential geometry for feature extraction. First, I utilized Cellpose, a deep learning model, to segment individual cells from brightfield microscopy data. Second, I implemented Principal Component Analysis (PCA) to standardize the major growth axis. To detect the birth scar, I applied B-spline interpolation to cell contours and calculated signed curvature through numerical differentiation. This transforms subtle visual features into a mathematically quantifiable signal. To handle biological variability and imaging artifacts, I implemented a context-aware validation algorithm with an "Asymmetric Fallback" mechanism to infer scar positions based on axial symmetry. Finally, I utilized the Hungarian algorithm for temporal cell tracking, allowing for the independent quantification of growth curves for "Old" and "New" compartments. This tool enables high-throughput, non-invasive analysis of growth dynamics without the phototoxicity associated with fluorescent reporters. This hybrid approach demonstrates the power of applying AI and discrete differential geometry to solve fundamental challenges in quantitative microbiology.

7 -- Mapping spatial gradients in spatial transcriptomics data with score matching

An Wang, Donald Geman, Uthsav Chitra*, Laurent Younes*

Spatial transcriptomics (ST) technologies measure gene expression at thousands of locations within a two-dimensional tissue slice, enabling the study of spatial gene expression patterns. Spatial variation in gene expression is characterized by spatial gradients, or the collection of vector fields describing the direction and magnitude in which the expression of each gene increases. However, the few existing methods that learn spatial gradients from ST data either make restrictive and unrealistic assumptions on the structure of the spatial gradients or do not accurately model discrete transcript locations/counts. We introduce SLOPER (for Score-based Learning Of Poisson-modeled Expression Rates), a generative model for learning spatial gradients (vector fields) from ST data. SLOPER models the spatial distribution of mRNA transcripts with an inhomogeneous Poisson point process (IPPP) and uses score matching to learn spatial gradients for each gene. SLOPER utilizes the learned spatial gradients in a novel diffusion-based sampling approach to enhance the spatial coherence and specificity of the observed gene expression measurements. We demonstrate that the spatial gradients and enhanced gene expression representations learned by SLOPER leads to more accurate identification of tissue organization, spatially variable gene modules, and continuous axes of spatial variation (isodepth) compared to existing methods. In particular, SLOPER reveals spatial colocalization patterns of testicular cell types and marker genes at different temporal stages of spermatogenesis.

8 -- Decoding Shared and Condition-Specific Transcriptional Programs in Wound Healing with Representation Learning

Ozgur Beker, Dreyton Amador, Jose Francisco Pomarino Nima, Simon Van Deursen, Yvon Woappi, Bianca Dumitrascu

Single-cell genomics enables the study of cell states and cell state transitions across biological conditions like aging, drug treatment, or injury. However, existing computational methods often struggle to simultaneously disentangle shared and condition-specific transcriptional patterns, particularly in experimental designs with missing data, unmatched cell populations, or complex attribute combinations. To address these challenges, Patches identifies universal transcriptomic features alongside condition-dependent variations in scRNA-seq data. Using conditional subspace learning, Patches enables robust integration, cross-condition prediction, and biologically interpretable representations of gene expression. Unlike prior methods, Patches excels in experimental designs with multiple attributes, such as age, treatment, and temporal dynamics, distinguishing general cellular mechanisms from condition-dependent changes. We applied Patches to both simulated data and real transcriptomic datasets from skin injury models, focusing on the effects of aging and drug treatment. Patches revealed shared wound healing patterns and condition-specific changes in cell behavior and extracellular matrix remodeling. These insights deepen our understanding of tissue repair and can identify potential biomarkers for therapeutic interventions, particularly in contexts where the experimental design is complicated by missing or difficult-to-collect data.

9 -- Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration

Tavor Z. Baharav, Phillip B. Nicol, Rafael A. Irizarry, Rong Ma

Modern data analysis increasingly requires identifying shared latent structure across multiple high-dimensional datasets. A commonly used model assumes that the data matrices are noisy observations of low-rank matrices with a shared singular subspace. In this case, two primary methods have emerged for estimating this shared structure, which vary in how they integrate information across datasets. The first approach, termed Stack-SVD, concatenates all the datasets, and then performs a singular value decomposition (SVD). The second approach, termed SVD-Stack, first performs an SVD separately for each dataset, then aggregates the top singular vectors across these datasets, and finally computes a consensus amongst them. While these methods are widely used, they have not been rigorously studied in the proportional asymptotic regime, which is of great practical relevance in today's world of increasing data size and dimensionality. This lack of theoretical understanding has led to uncertainty about which method to choose and limited the ability to fully exploit their potential. To address these challenges, we derive exact expressions for the asymptotic performance and phase transitions of these two methods and develop optimal weighting schemes to further improve both methods. Our analysis reveals that while neither method uniformly dominates the other in the unweighted case, optimally weighted Stack-SVD dominates optimally weighted SVD-Stack. We extend our analysis to accommodate multiple shared components, and provide practical algorithms for estimating optimal weights from data, offering theoretical guidance for method selection in practical data integration problems. Extensive numerical simulations and semi-synthetic experiments on genomic data corroborate our theoretical findings.

10 -- no poster

11 -- Multitask learning enables discovery of nontoxic antibiotics

Yu Zhang*, Andrew Hennes*, Aarti Krishnan*, Satotaka Omori*, Alicia Li*, Leif Sieben, Ronak Desai, Seyed Majed Modaresi, Maxwell Z. Wilson, Felix Wong, James J. Collins

Toxicity is a major barrier to drug development from early discovery through clinical trials. Computational models aim to reduce this barrier by predicting the toxicity profiles of small molecules, but their performance and generalizability are limited by the lack of large, chemically diverse datasets. Here, we report the largest publicly available cytotoxicity screen to date, empirically profiling 100,390 diverse compounds across three human cell types. Using this foundational dataset, we developed SafetyNet, a deep learning platform that combines pretrained molecular embeddings with multitask learning for high-accuracy toxicity prediction. SafetyNet outperforms existing models and generalizes well to predict mechanisms of toxicity as well as in vivo animal and clinical safety. We applied SafetyNet to a prospective screen for compounds with antibiotic activity against Escherichia coli and Klebsiella pneumoniae—Gram-negative pathogens for which discovering nontoxic antibiotics remains particularly challenging—and found that SafetyNet substantially enriched for nontoxic antibacterial compounds. SafetyNet offers a scalable and extensible framework for toxicity prediction that promises to reduce toxicity-related drug attrition and accelerate the discovery of selective drug candidates.

12 -- scDataset: Scalable Data Loading for Deep Learning on Large-Scale Single-Cell Omics

Davide D'Ascenzo, Sebastiano Cultrera di Montesano

Training deep learning models on single-cell datasets with hundreds of millions of cells requires loading data from disk, as these datasets exceed available memory. While random sampling provides the data diversity needed for effective training, it is prohibitively slow due to the random access pattern overhead, whereas sequential streaming achieves high throughput but introduces biases that degrade model performance. We present scDataset, a PyTorch data loader that enables efficient training from on-disk data with seamless integration across diverse storage formats. Our approach combines block sampling and batched fetching to achieve quasi-random sampling that balances I/O efficiency with minibatch diversity. On Tahoe-100M, a dataset of 100 million cells, scDataset achieves more than two orders of magnitude speedup compared to true random sampling while working directly with AnnData files. We provide theoretical bounds on minibatch diversity and empirically show that scDataset matches the performance of true random sampling across multiple classification tasks.

13 -- SpatialFusion: A lightweight multimodal foundation model for pathway-informed spatial niche mapping

Josephine Yates, Mitra Shavakhi, Toni K. Choueiri, Rebecca Leary, Lisa Kattenhorn, Eliezer M. Van Allen, Caroline Uhler

Foundation models enable knowledge transfer across data modalities and tasks, yet foundation models for spatial biology remain in their early stages, largely centered on encoding single-cell representations in spatial context without fully integrating transcriptomic and morphological information to delineate functional niches. Here we introduce SpatialFusion, a lightweight multimodal foundation model that identifies biologically coherent microenvironments defined by distinct pathway activation patterns rather than spatial proximity alone. SpatialFusion integrates paired histopathology, gene expression, and inferred pathway activity into a unified representation. Compared with two specialist niche-detection methods and four spatial foundation models, SpatialFusion performs competitively and consistently resolves fine-grained spatial niches with unique pathway-level signatures. Applying the model to two Visium HD cohorts uncovered a pre-malignant niche in morphologically normal mucosa adjacent to colorectal tumors and revealed distinct malignant microenvironments in non-small cell lung cancer that were predictive of tumor stage. Overall, SpatialFusion offers a versatile framework for multimodal spatial analysis, enabling the discovery of new morpho-molecular niches with significant biological and clinical relevance.

14 -- GeneGrad: Geometric Gradient-Based Biomarker Discovery in Single-Cell Transcriptomics

Shupeng Luxu, Rong Ma

Single-cell technologies have enabled the characterization of gene expression at cellular resolution, providing critical insights into both static cellular heterogeneity and dynamic biological processes such as cell differentiation and disease progression. Current approaches for cell type annotation rely heavily on clustering resolution and known reference marker genes, which may overlook biomarkers that capture more subtle differences critical for distinguishing transient cell states within existing cell types that are associated with developmental or disease processes. Although several “cluster-free” biomarker identification methods have been proposed, they do not fully exploit the information encoded in the smooth local geometry of the cell-state manifold.

Here, we introduce GeneGrad, a novel computational framework designed to identify key genes by estimating gene-specific geometric expression gradients—the directions of maximal expression increase within each cell’s local neighborhood, defined with respect to the intrinsic geometry of the high-dimensional cell-state manifold. The resulting gene-gradient vector fields can be projected into two-dimensional space alongside the cells, enabling the discovery of important geometric patterns such as flow divergence, local heterogeneity, and alignment with developmental trajectory. This analysis facilitates the identification of important genetic programs driving cell-state transitions. We first validated our methodology on synthetic data, demonstrating high accuracy in both gradient estimation and direction recovery. Then, we apply GeneGrad to developmental single-cell datasets of hematopoietic progenitor cells (HPCs), identifying markers predictive of their subsequent divergence into ten lineages. We further extend the framework beyond single-cell transcriptomics to additional modalities—including ATAC-seq—highlighting its utility for uncovering biologically meaningful signals in complex cellular processes.

15 -- DeepSLP: a deep learning framework leveraging cancer dependency data to predict synthetic lethality in human cells

Xiang Zhang, Ke-Chin Chen, Daniel Chang, Arshia Z. Hassan, Michael Costanzo, Kevin R. Brown, Maximilian Billmann, Brenda J. Andrews, Jason Moffat, Charles Boone, Chad L. Myers*

Motivation: Synthetic lethality (SL) offers a promising avenue for developing cancer-selective therapies. However, experimentally mapping the vast landscape of genetic interactions in human cells remains labor-intensive and incomplete. Computational predictors can narrow the search, yet most existing models rely on limited training standards and heterogeneous curated features that bias results toward well-studied genes and leave many pairs unassessed. The growing availability of systematically quantified genetic interactions from CRISPR screens and human functional genomics datasets provides an opportunity to build more accurate and comprehensive machine learning models of SL relationships.

Results: We present DeepSLP (Deep Synthetic Lethality Predictor), a deep neural network framework for predicting synthetic lethal interactions using functional genomics features derived from the Cancer Dependency Map (DepMap). Pairwise feature vectors are constructed from (i) CRISPR knockout fitness (gene effect) profiles and (ii) gene expression signatures across more than 1,000 cancer cell lines, enabling genome-wide coverage without relying on external annotations. DeepSLP is trained and evaluated on a recently published HAP1 genetic interaction dataset comprising about 4 million quantitatively scored genetic interactions in human HAP1 cells. We also experimentally validated predictions involving an additional query gene, MTOR, showcasing DeepSLP’s generalizability on previously unseen targets. Assessed for its performance in both in-context and cross-context scenarios, DeepSLP demonstrates strong predictive performance, scalability, and robustness, supporting its potential to inform the design of more efficient genetic interaction studies and deepen our understanding of SL in human cells.

16 -- Deep learning-based analysis reveals patient-level cancer therapy trajectories using single-cell PBMC chromatin images

Hannah M. Schlüter, Trinadha Rao Sornapudi, Dominic Leiser, Sandra Koller, Zeynep Karavelioglu, Caroline Uhler, Damien Weber, G. V. Shivashankar

Peripheral blood mononuclear cells (PBMCs) profiled in different data modalities have emerged as diagnostic and prognostic biomarkers for cancer. Here we demonstrate that PBMC chromatin images contain sufficient information to track patients’ trajectories during and after radiation therapy. We collected images from 150 patients across a variety of cancer types including Central Nervous System (CNS) cancer, Head & Neck cancer, Chordoma and Chondrosarcoma, Sarcoma, and Lymphoma, and adapted a multiple instance learning-based framework, which effectively enhanced the patient-level classification and interpretation of images from diverse samples compared to simpler cell-level neural network-based classifiers. We found that patients whose PBMCs returned to a state more similar to healthy after the therapy had more favorable outcomes. Moreover, our framework could indicate whether Head & Neck cancer patients’ PBMCs would likely return to a state similar to healthy after radiation therapy using solely chromatin images from before the treatment. These findings motivate further study of PBMC chromatin images to be used as a non-invasive, easily obtained, and inexpensive biomarker for cancer detection, treatment monitoring, and treatment response prediction.

17 -- GS2: A Gram-Schmidt Approach to Gene Selection for High-Dimensional Genomic Data

Bahram Yaghooti, Bruno Sinopoli, Yang Eric Li

High-dimensional genomic data often include tens of thousands of features (e.g., genes), necessitating effective dimensionality reduction. Principal Component Analysis (PCA) is widely used to support downstream analyses by mitigating the curse of dimensionality and improving model stability. However, PCA has key limitations: as a linear technique, it cannot capture nonlinear dependence; it can be unstable when features far exceed samples; and it performs poorly on sparse single-cell data with dropout and zero inflation. To address these limitations, we propose the Gram-Schmidt Gene Selection (GS2) algorithm. GS2 is an unsupervised feature selection method that iteratively removes redundant and irrelevant genes via Gram-Schmidt orthogonalization on multilinear polynomials and serves as a preprocessing step before PCA. GS2 begins by selecting the highest-variance gene and proceeds iteratively. At each step, it constructs a residual by removing projections onto functions associated with previously selected genes, computes residual variances, and selects the gene with the largest residual variance. It subsequently orthogonalizes the functions of this newly selected gene with respect to those of the previously selected genes so that linear and nonlinear dependencies explained by the selected genes are removed before the next step. The process continues until the maximum residual variance drops below a threshold, yielding a subset of genes that preserves the informative structure of the data while discarding genes that are linearly or nonlinearly redundant. It operates effectively on sparse datasets and is well-suited for high-dimensional settings, making it efficient and practical for large-scale genomic data. We evaluate GS2 through numerical experiments on multiple datasets. GS2 consistently improves classification performance while reducing feature redundancy, enabling clearer model interpretation. The biological relevance of selected genes is supported by Gene Ontology (GO) enrichment analysis, revealing significant associations with pathways and functions consistent with known cellular mechanisms.

18 -- Causal Representation Meets Stochastic Modeling under Generic Geometry

Jiaxu Ren, Yixin Wang, Biwei Huang

Learning meaningful causal representations from observations has emerged as a crucial task for facilitating machine learning applications and driving scientific discoveries in fields such as climate science, biology, and physics. This process involves disentangling high-level latent variables and their causal relationships from low-level observations. Previous work in this area that achieves identifiability typically focuses on cases where the observations are either i.i.d. or follow a latent discrete-time process. Nevertheless, many real world settings require identifying latent variables that are continuous-time stochastic processes (e.g., multivariate point processes). To this end, we develop identifiable causal representation learning for continuous-time latent stochastic point processes. We study its identifiability by analyzing the geometry of the parameter space. Furthermore, we develop MUTATE, an identifiable variational autoencoder framework with a time-adaptive transition module that enables inference about stochastic dynamics. Across simulated and empirical studies, we find that MUTATE can effectively answer scientific questions, such as the accumulation of mutations in genomics and the mechanisms driving neuron spike triggers in response to time-varying dynamics.

19 -- Error-controlled decisions for safe use of medical foundation models

Intae Moon, Ying Jin, Marinka Zitnik

Background: Medical foundation models (FMs) often output point predictions without decision-time error guarantees, so strong average accuracy can still yield harmful mistakes among predictions treated as confident.

Objective: We develop StratCP, a model-agnostic decision layer built on conformal prediction (CP), to support FM-based clinical decision support under user-specified error budgets.

Method: StratCP extends CP from population-level guarantees to action-aligned calibration by explicitly splitting decisions into (i) an action arm that selects patients whose FM predictions are eligible for downstream action while controlling the false discovery rate (FDR) at a user-specified budget (e.g., 5%), and (ii) a deferral arm that returns calibrated prediction sets achieving target coverage among deferred patients (e.g., containing the true disease status for 95% of deferred patients). When available, diagnostic-guideline structure is encoded as a utility graph to produce clinically coherent prediction sets for deferred patients without sacrificing coverage guarantees.

Results: We pair StratCP with a retinal FM (Zhou et al., Nature 2023) and a pathology FM (Chen et al., Nature Medicine 2024) and evaluate it across ophthalmology and neuro-oncology. In eye-condition diagnosis, StratCP and CP both meet the 5% acted-upon error budget, but StratCP enables action for more patients on average (119.2 vs. 97.5), reflecting better calibration and more efficient use of the budget. In central nervous system (CNS) tumor subtyping, StratCP abstains as needed to satisfy the 5% budget (61 selected, FDR 0.048) while maintaining 95% target coverage for deferred patients; CP selects more cases but overspends the budget (174 selected, FDR 0.090). Utility enhancement further improves guideline coherence of deferred prediction sets (adjacent-grade sets: 0.83 vs. 0.12 without enhancement) while preserving 95% coverage. In adult-type diffuse glioma triage, StratCP supports slide-only glioblastoma (IDH-wildtype) diagnoses for 152 of 463 slides while keeping FDR near 5% (0.052), potentially reducing confirmatory molecular testing and shortening time to integrated diagnosis.

Conclusion: StratCP equips medical FMs with an error-budgeted decision layer, selection plus calibrated deferral, without retraining, enabling safer deployment across clinical domains.

20 -- An agentic AI spatial molecular foundation model of the mouse brain

Yichun He, Hao Sheng, Mingze Yuan, Yiming Zhou, Hailing Shi, Wendy Xueyi Wang, Zefang Tang, Zuwan Lin, Wenbo Wang, Na Li, Jia Liu*, Xiao Wang*

Comprehensive spatial characterization of the brain at single-cell molecular resolution is essential for decoding its structure, function, and disease mechanisms. However, current spatial transcriptomics atlases are individually limited in coverage, resolution, and gene throughput, and lack integration across platforms. Here, we introduce FuseMap, a self-supervised deep learning framework that generates a foundation model of the mouse brain by integrating seven large-scale mouse brain atlases spanning 18.6 million cells/spots, 26,665 genes, and 434 sections. FuseMap constructs unified gene, single-cell, and spatial representations to generate a comprehensive molecular common coordinate framework (molCCF), enabling harmonization of cell-type and brain-region nomenclature, imputation of transcriptome-wide spatial expression, informed selection of targeted gene panels, and discovery of novel regions and region-specific cell-cell interactions. Through transfer learning, FuseMap accurately maps and annotates new datasets, overcoming the limitations of manual labeling. We further empower FuseMap by developing an agentic AI system, which performs multi-step reasoning, planning, and tool-augmented knowledge retrieval for autonomous spatial brain atlas analysis and biological discovery in unseen datasets from various genetic backgrounds and experimental conditions. We demonstrate its utility for high-throughput biological discovery in cross-species comparison, aging brain analysis, and neurodegeneration disease studies. Beyond the brain, we further demonstrated applying FuseMap and its AI agent in multi-organ analyses during embryonic development as a scalable and generalizable framework for spatial single-cell biology.

21 -- A Signed Multi-Omics Atlas Knowledge Graph Foundation Model Enables Accurate and Causal Prediction of Drug Actions

Mohammadsadeq Mottaqi, Shuo Zhang, Lei Xie

Graph neural networks (GNNs) have emerged as powerful tools for studying drug-gene-protein-disease relationships, yet existing models face critical limitations: they rely on small, biased subsets of bioactive compounds, fail to capture the polarity nature of molecular interactions (e.g., inhibition vs activation) and therapeutic effects (e.g., efficacy and side effects), and inadequately integrate multi-omics data. We introduce SIGMA-KG (SIGned Multi-omics Atlas Knowledge Graph) and FLASH (Fast Lightweight Architecture for Signed Heterogeneous GNN), a foundation model that addresses these challenges through a fundamentally different approach. SIGMA-KG integrates chemical genomics, transcriptomics, proteomics, and clinical data into a signed multi-layer multiplex network where edges explicitly encode the polarity of biological effects and clinical outcomes. This signed representation enables the model to leverage structural balance principles, such as drug-mediated suppression of disease-promoting genes, creating therapeutically beneficial negative-negative relationships. Self-supervised pre-training of FLASH on SIGMA-KG, the learned representations substantially outperform state-of-the-art conventional homogeneous (unsigned) and relational GNN models across critical prediction tasks: ligand-induced functional selectivity (agonist vs. antagonist), drug clinical effects (therapeutic vs. adverse effects), and drug-drug interactions. We applied FLASH and SIGMA-KG to drug repurposing and identified a number of promising drug candidates for multiple complex diseases. FLASH and SIGMA-KG provide a new framework for mechanistic drug discovery, enabling improved drug repositioning, rational polypharmacy design, and early-stage safety assessment.

22 -- Cross-modal transfer learning for mapping bulk cancer phenotypes at single-cell resolution

Aarthi Venkat, Daniel Marbach, Zhiwen Jiang, Nir Hacohen, Marinka Zitnik

In cancer, single-cell transcriptomics has revealed deep insights into the tumor microenvironment, capturing gene programs, cell states, and multicellular hubs that underlie tumor initiation, progression, and treatment resistance. Despite their potential to yield clinically actionable biomarkers, single-cell technologies are not typically used for clinical decision-making due to financial, conceptual, and computational constraints. Instead, bulk transcriptomic technologies, which measure the average expression across entire samples rather than individual cells, are a more routine component of cancer care. However, approaches to derive patient endotypes and predict treatment response from bulk data alone are fundamentally limited in their ability to characterize cell-type-specific changes and often exhibit poor generalizability across diseases and cohorts due to the reliance on predefined sets of biomarkers.

Here, we present a parameter-efficient transfer learning approach to flexibly identify cellular and molecular factors underlying bulk clinical phenotypes. We propose an ontology-aware contrastive learning objective that first models cell type proportions from single-cell data as signals on the Cell Ontology graph, then aligns single-cell and pseudobulked samples to preserve pairwise similarities defined by graph-based optimal transport. This enables learning bulk representations that capture cell type-level differences. We further propose a cohort-aware hard negative sampling strategy to improve cross-cohort generalization. Trained with 1,631 single-cell tumor samples from the Curated Cancer Cell Atlas (3CA), our approach improves preservation of cell type proportion similarity, achieving 19.5% increase in within-cohort prediction and 40.7% increase in cross-cohort prediction over existing methods across five independent cohorts. We additionally fine-tune our model on 330 bulk melanoma samples using a large-scale single-cell melanoma atlas, enabling prediction of immunotherapy response, overall survival, and associated cell populations enriched in bulk samples. Together, our method learns patient representations that capture single-cell-level differences between bulk samples, providing an interpretable framework for population selection and patient matching across cohorts and indications.

23 -- Towards Generalist Agents for Accelerating Biomedical Discovery

Yuanqi Du, Botao Yu, Tianyu Liu, Tony Shen, Junwu Chen, Jan G. Rittig, Kunyang Sun, Yikun Zhang, Zhangde Song, Bo Zhou, Cassandra Masschelein, Yingze Wang, Haorui Wang, Haojun Jia, Chao Zhang, Hongyu Zhao, Martin Ester, Teresa Head-Gordon, Carla P. Gomes, Huan Sun, Chenru Duan, Philippe Schwaller, Wengong Jin

Large language models (LLMs) are opening new frontiers in scientific research, enabling capabilities ranging from literature retrieval and hypothesis generation to experiment planning and operation. In this talk, I will first present quantitative evidence that LLMs encode substantial scientific knowledge and that appropriate sampling/search strategies can reliably extract it. I will then focus on a key missing ingredient for applying LLMs to discovery in the natural sciences: automating objective-function design. To address this, I will introduce the Scientific Autonomous Goal-Evolving Agent (SAGA), which analyzes optimization outcomes, proposes improved objectives, and translates them into computable scoring functions with end-to-end validation. I will demonstrate SAGA across diverse discovery settings, including antibiotic design, inorganic materials design, functional DNA sequence design, and chemical process design. Finally, I will summarize lessons learned and discuss implications for the next generation of generalist scientific agents.

24 -- Biosynthesis of non-standard amino acids with AI

Anush Chiappino-Pepe*, Peter G Mikhael, Itamar Chinn, Bogdan Budnik, Alexander Pauer, Taylor Lanosky, Jenny M. Tam, Gregory Stephanopoulos, Regina Barzilay, George M. Church

Molecules with new functional groups that expand the amino acid alphabet could enable new forms of molecular recognition, cellular engineering, information storage, and catalysis, yet their availability has been limited to lengthy and costly chemical synthesis. Here we have discovered enzymes for the de novo biosynthesis of new non-standard amino acids. Using a machine-learning-guided search across plant proteomes coupled with ultra-sensitive metabolomics, we identified and characterized non-standard amino acid synthases that catalyze an unprecedented biochemistry for non-standard amino acid biosynthesis. These non-standard amino acid synthases define a new subclass of enzymes that expands the canonical scope of amino acid biosynthesis. Phylogenomic analyses indicate that these enzymes are widespread across angiosperms, revealing a previously unrecognized, widespread connection in amino acid metabolism. By uncovering nature’s capacity to expand the amino acid alphabet of life, this work opens new avenues for programmable, molecular architectures and expanded biological codes.

25 -- Multimodal Characterization of Sleep as a Phenotype in Neurodevelopmental Conditions

Sanketh Vedula, Andrew Kim, Manoj Kumar, Olga Troyanskaya, Guillermo Sapiro

Sleep is an important and quantifiable phenotype that reflects underlying neurophysiological processes as well as behavioral and health related factors. A growing body of work has reported associations between neurodevelopmental and neurodegenerative conditions and alterations in sleep timing, continuity, and variability, suggesting that sleep may provide a useful lens for studying these conditions. However, much of the existing literature relies on coarse or cross sectional measurements, limiting the ability to capture heterogeneity across individuals and nights and motivating the use of longitudinal, multimodal data to study sleep more systematically.

In this study, we examine sleep patterns using data from the Simons Sleep Project, a home based autism cohort (N=102; control=98) that combines multi night objective sleep measurements with behavioral questionnaires and genetic information. The dataset includes physiological recordings from wearable devices such as EEG enabled headbands, wrist worn sensors, and under mattress sleep monitors, alongside daily sleep diaries, standardized behavioral assessments, and whole exome sequencing. This multimodal structure enables joint analysis of sleep timing, continuity, variability, and neurophysiological proxies in real world settings, and supports investigation of how these features relate to behavioral phenotypes and genetic background.

We use multivariate statistical analysis frameworks to compare sleep features across individuals with Autism Spectrum Disorder (ASD), Attention Deficit Hyperactivity Disorder (ADHD), and neurotypical controls. The analysis focuses on distributional differences, intra individual variability, and cross modal relationships across physiological and behavioral measurements rather than diagnostic classification alone. In addition to conventional sleep summary metrics, we incorporate learned feature representations derived from foundation models that capture structure directly from physiological signals and reduce dependence on explicit sleep staging assumptions. In this poster, we will present findings from our ongoing study and discuss how multimodal analysis can support conservative and interpretable characterization of sleep heterogeneity, as well as generate hypotheses regarding biological mechanisms that may contribute to observed sleep differences.

26 -- no poster

27 -- Tell me who you walk with and I’ll tell you who you are: Bayesian network analysis reveals inflammation-dependent fungal interaction networks in ulcerative colitis from an underrepresented Latin American Cohort

Tamara Pérez-Jeldres, Ivania Valdés, Gabriel Ascui, Roberto Segovia, Cristian Hernández-Rocha, Carolina Pavez, Elisa Hernández, Verónica Silva, Elizabeth Arriagada, Lorena Azócar, Manuel Álvarez-Lobos, Erick Riquelme, Archana Sharma-Oates

IBD microbiome research has largely focused on bacterial dysbiosis in European populations, leaving the fungal component and population-specific interactions underexplored. We performed a comprehensive mycobiome and host integration analysis in a Chilean ulcerative colitis (UC) cohort including active UC, inactive UC, and non-IBD controls. Fungal community composition was profiled using ITS sequencing and analyzed using centered log-ratio–transformed abundances. Differentially abundant fungal species were identified across clinical comparisons using complementary statistical approaches, including LEfSe (LDA) and MaAsLin2. Species consistently detected as differentially abundant were used as candidate features for downstream analyses. To evaluate the discriminatory capacity of these fungal signatures, we constructed supervised classification models to distinguish clinical states using Random Forest and XGBoost. Model performance was assessed using standard classification metrics, and SHAP values. Fungal species that consistently contributed to accurate classification across comparisons were retained as robust, disease-associated features. These selected fungal species were subsequently used to infer state-specific Bayesian networks for active UC, inactive UC, and control groups. Network structure learning was performed using bootstrap resampling to identify stable conditional dependencies among fungal species within each clinical state. This approach revealed inflammation-associated reorganization of fungal interaction networks, including changes in network connectivity, the number of connected components, and shifts in hub species that were not explained by abundance changes alone. In a subset of patients with available bulk RNA-sequencing data, Bayesian network–derived fungal species were integrated with host transcriptomic profiles. Variance-stabilized gene expression data were correlated with fungal abundances using within-state Spearman analyses, identifying coordinated fungi–host associations linked to immune and inflammatory pathways. Our results shows that UC is associated with inflammation-dependent restructuring of fungal interaction networks and highlight the value of combining differential abundance testing, machine learning, and Bayesian network inference to characterize population-specific host–mycobiota interactions in underrepresented IBD cohorts.

28 -- CRIMP efficiently quantifies sub-cellular gene expression patterns from image-based spatial transcriptomics

Claire Wu, Hannah M. Schlüter, Uthsav Chitra, Caroline Uhler

Imaging-based spatial transcriptomics (iST) technologies measure mRNA transcripts at sub-cellular resolution, offering insight into how gene expression is distributed within individual cells. However, existing approaches for quantifying sub-cellular expression patterns either do not account for the high sparsity of iST data, or are computationally expensive and rely on geometric transformations that are often inaccurate for irregularly shaped cells. We introduce CRIMP (for Continuous Radial Intra-cellular Molecular Patterns), a highly efficient computer vision-based algorithm for quantifying radial, sub-cellular gene expression patterns. CRIMP models gene expression as a function of the “effective radius”—a 1-D coordinate measuring the distance from cell and nuclear boundaries that is computed using signed distance functions and is applicable for cells of any shape. CRIMP uses the effective radius to robustly quantify sub-cellular expression patterns for individual genes as well as identify groups of genes (modules) with shared sub-cellular expression patterns. We demonstrate that CRIMP accurately identifies sub-cellular expression patterns on data simulated at realistically sparse transcript densities. We further derive a statistical test for differential localization, a sub-cellular analogue of differential expression, measuring whether a gene has different sub-cellular localization patterns in two different groups of cells. On iST data from fibroblasts in culture and mouse brain tissue, CRIMP identifies novel functional gene modules and genes with differential localization across cell types.

29 -- BRIDGE: Benchmarking Large Language Models for Understanding Global Real-world Clinical Practice Text

Jiageng Wu, Bowen Gu, Ren Zhou, Kevin Xie, Doug Snyder, Yixing Jiang, Valentina Carducci, Richard Wyss, Rishi J Desai, Emily Alsentzer, Leo Anthony Celi, Adam Rodman, Sebastian Schneeweiss, Jonathan H. Chen, Santiago Romero-Brufau, Kueiyu Joshua Lin, Jie Yang

Background: Large language models (LLMs) show promise in clinical applications, yet current benchmarks mostly rely on medical exam-style or PubMed-derived text, failing to reflect the complexity of real-world clinical contexts.

Objective: To introduce BRIDGE, a multilingual benchmark for real-world clinical text understanding, and to systematically characterize performance across broad advanced LLMs, prompting strategies, languages, task types, and clinical specialties.

Methods: We curated publicly accessible clinical practice text from multiple global sources and constructed 87 tasks spanning 9 languages, 8 task types, 14 clinical specialties, and multiple stages of patient care. Tasks were standardized with consistent instructions, input–output formats, and evaluation protocols, and organized with a structured taxonomy by task nature and clinical scenario. We evaluated 95 state-of-the-art LLMs, including proprietary models (e.g., GPT-4o, Gemini), open-source general models (e.g., DeepSeek-R1, Qwen3 series), and medical-domain LLMs (e.g., MedGemma, Me-LLaMA). For each model, we tested three inference strategies: (1) zero-shot QA, (2) chain-of-thought (CoT) with explicit reasoning plus final answer, and (3) few-shot QA using five randomly sampled completed exemplars as in-context demonstrations. We further built a public-facing leaderboard enabling multi-level analyses across model families, strategies, languages, task types, and specialties.

Results: Performance varied substantially across languages, task types, and specialties. Under zero-shot prompting, the best overall models achieved scores around the mid-40s, with Gemini-2.5-Flash (44.8), DeepSeek-R1 (44.2), and GPT-4o (44.2) leading. Scaling trends largely held within model families (e.g., Llama, Qwen, Me-LLaMA). Few-shot prompting improved 91/95 models (95.8%), with Gemini-1.5-Pro rising from 43.8 to 55.5. In contrast, CoT—while improving interpretability—could reduce performance.

Conclusions: BRIDGE provides a comprehensive, multilingual evaluation of LLMs on real-world clinical text, highlighting persistent gaps and high variability. Few-shot prompting offers a strong, practical boost, whereas CoT may be unreliable, motivating further work on effectiveness, stability, and safer deployment in complex clinical settings.

30 -- Sensitive detection of rare immune cells in checkpoint inhibitor–induced myocarditis with scBoost

Fatima Zahra El Hajji, Salim El Mejjad, Salwa Enezari, Anas Bedraoui, Rachid El Fatimy, Tariq Daouda

Immune checkpoint inhibitors (ICIs) have transformed cancer therapy but can trigger severe immune-related toxicities. Among these, myocarditis is a life-threatening complication with high fatality rates. The cellular mechanisms underlying ICI-associated myocarditis (irMyocarditis) remain poorly defined, limiting early detection and risk stratification. Resolving this pathology requires high-resolution profiling of immune cell states, including rare pathogenic populations that may disproportionately drive tissue injury. Single-cell RNA sequencing (scRNA-seq) provides such resolution, yet integration of heterogeneous datasets often obscures rare but biologically important signals. Here, we introduce scBoost, a deep-learning framework designed to enhance rare cell type discovery in integrated scRNA-seq datasets. scBoost preserves critical biological signals through adaptive activation layers, enabling robust detection of sparse disease-relevant populations across complex, multi-batch datasets. We applied scBoost to a large irMyocarditis scRNA-seq cohort, including peripheral blood mononuclear cells (PBMCs; 366,066 cells) and cardiac-infiltrating immune cells (84,576 cells). While irMyocarditis is characterized by T-cell infiltration into the myocardium, scBoost identified a distinct population of proliferative cytotoxic CD8 T cells in peripheral blood prior to ICI therapy in patients who later developed myocarditis. These cells shared T-cell receptor clones with cycling CD8 T cells observed in the heart during disease, indicating systemic clonal expansion and targeted cardiac infiltration. Our findings reveal a rare circulating immune signature that precedes clinical onset of irMyocarditis and may serve as a predictive biomarker for immune-related cardiac toxicity. Beyond this, scBoost provides a generalizable framework for integrating scRNA-seq datasets to uncover clinically relevant rare cell populations, bridging computational innovation with translational immunology.

31 -- From Evolutionary Likelihood to Task-Specific Fitness: Few-Shot Adaptation for Efficient Protein Optimization

Zhilei Bei

Predicting the functional effects of protein mutations is a central challenge in protein engineering, yet experimental fitness measurements are expensive and scarce. This setting naturally gives rise to a few-shot task-conditioned inference problem, where each protein assay defines a task and a small support set of mutation–fitness pairs is used to infer the underlying fitness landscape. Existing approaches largely fall into two categories. Pretrained protein language models provide zero-shot mutation effect predictions based on evolutionary likelihoods, but are not adapted to task-specific fitness landscapes. In contrast, supervised fitness predictors can achieve higher accuracy, but require substantial labeled data and generalize poorly across proteins or assays. Together, these limitations motivate models that can infer task-aware protein-specific fitness landscapes from limited experimental data, enabling efficient protein optimization. We introduce a few-shot adaptation framework that aligns pretrained model likelihoods with task-specific fitness landscapes through preference learning, preserving the original model structure without requiring additional regression heads. To address data scarcity, we employ pairwise ranking objectives that transform N labeled examples into O(N²) preference pairs, substantially increasing the amount of effective supervision signals, and combine this with meta-transfer learning across multiple assays. We evaluate our method on deep mutational scanning datasets from ProteinGym, comparing against zero-shot scoring and supervised regression baselines. With as few as 64 labeled mutants per task, our approach achieves 50% relative improvement in Spearman correlation over zero-shot baselines and matches or exceeds supervised models trained with 8× more labeled data. Overall, this work enables efficient and generalizable protein fitness modeling in low-data regimes, addressing a key bottleneck in practical protein engineering.

32 -- High-Resolution Regulatory Mapping for Sequence-to-Function Modeling in Crop Plants

Evan Groover, David Ding, Gonzalo Benegas, Flora Wang, Stephen Chen, Michael Moubarak, Krishna Niyogi, Yun Song, David Savage

Predicting how non-coding DNA sequence variation quantitatively alters gene expression remains a central challenge for crop breeding and improvement. While deep learning models have shown promise in modeling regulatory sequence to predict crop phenotypes, their development is constrained by limited availability of high-resolution datasets for training and refinement. To address this, we developed a novel massively parallel regulatory mapping approach in the globally important cereal crop Sorghum bicolor. We systematically assay tens of thousands of CRISPR-like cis-regulatory perturbations- including single-nucleotide substitutions, deletions, and insertions- across multiple genes, yielding dense sequence-function maps that link perturbations to quantitative expression outcomes. This dataset provides a benchmark resource for training, evaluating, and refining genomic language models of regulatory sequence function. We demonstrate its utility by refining a DNA sequence language model pretrained via unsupervised learning on plant genomic sequence. Our results show that this modeling framework is effective at predicting the effects of small, local perturbations, but substantially less accurate for higher-order rearrangements and insertions, delineating current limits of sequence-based generalization. Notably, we observe that regulatory effects are strongly locus-dependent, highlighting limitations of generalist models and motivating context-aware approaches for regulatory prediction. Together, this work establishes a scalable experimental framework for building, refining, and validating sequence-to-function models for predictive crop engineering.

33 -- no poster

34 -- Learning Universal Representations of Intermolecular Interactions with ATOMICA

Ada Fang, Michael Desgagné, Zaixi Zhang, Andrew Zhou, Joseph Loscalzo, Bradley L. Pentelute, Marinka Zitnik

Molecular interactions underlie nearly all biological processes, but most machine learning models treat molecules in isolation or specialize in a single type of interaction, such as protein-ligand or protein-protein binding. Here, we introduce ATOMICA, a geometric deep learning model that learns atomic-scale representations of intermolecular interfaces across five modalities, including proteins, small molecules, metal ions, lipids, and nucleic acids. ATOMICA is trained on 2,037,972 interaction complexes using self-supervised denoising and masking to generate embeddings of interaction interfaces at the levels of atoms, chemical blocks, and molecular interfaces. ATOMICA’s latent space is compositional and captures physicochemical features shared across molecular classes, enabling representations of new molecular interactions to be generated by algebraically combining embeddings of interaction interfaces. The representation quality of this space improves with increased data volume and modality diversity. As in pre-trained natural language models, this scaling law implies predictable gains in performance as structural datasets expand. We construct modality-specific interfaceome networks, termed ATOMICANETs, which connect proteins based on interaction similarity with ions, small molecules, nucleic acids, lipids, and proteins. By overlaying disease-associated proteins of 27 diseases onto ATOMICANETs, we find strong associations for asthma in lipid networks and myeloid leukemia in ion networks. We use ATOMICA to annotate the dark proteome—proteins lacking known function—by predicting 2,646 uncharacterized ligand-binding sites, including putative zinc finger motifs and transmembrane cytochrome subunits. We experimentally confirm heme binding for five ATOMICA predictions in the dark proteome. By modeling molecular interactions, ATOMICA opens new avenues for understanding and annotating molecular function at scale.

35 -- Quantifying inheritance of protein regulatory networks within yeast cells lineages via fused ordinary differential equation modeling

Shirley Mathur, Wenbin Wu, Taylor Kennedy, Orlando Argüello-Miranda, Kevin Z. Lin

A fundamental question of interest in cell biology is understanding how regulatory mechanisms are inherited during cell division. Time-lapse microscopy of dividing S. cerevisiae (yeast) offers new biological data to better understand this mechanism, but the computational tools to analyze such data are still underexplored. We propose a novel method called PRNODE (Protein Regulatory Networks via Ordinary Differential Equations) based on graphical lasso to 1) jointly estimate protein regulatory networks for a mother cell and one of its daughters and 2) quantify the inheritance of a regulatory network from the mother cell to its daughter. This is methodologically critical: because daughter cells are born during the time-lapse experiment, they have fewer longitudinal samples, risking a potentially more inaccurate estimation of their regulatory network. By jointly estimating the mother and daughter networks together (i.e., "fused" estimation) in a statistical regularization framework, we demonstrate that our method increases the accuracy of estimating the regulatory networks while not diminishing its ability to assess different degrees of inheritance.

We apply our method to 181 pairs of mother-daughter cells to understand the inheritance of protein regulatory networks. For this, we collected time-lapse microscopy data tracking the fluoresence intensity of 6 proteins (Cdc10, Stb3, CLB5, Whi5, Xbp1, and Tup1) simultaneously with a 12 minute sampling rate over a 9+ hour experiment. We observe certain regulatory mechanisms that are consistently inherited by daughter cells. Furthermore, we estimate the amount of inheritance, which depends on the yeast cluster. We validate our inheritance metric by applying PRNODE to mother cells that are synthetically paired with unrelated daughter cells. We demonstrate that our estimated amount of inheritance is often higher in the observed mother-daughter pairings when compared to these synthetic mother-daughter pairings.

36 -- no poster

37 -- CONCERT predicts niche-aware perturbation responses in spatial transcriptomics

Xiang Lin, Zhenglun Kong, Soumya Ghosh, Manolis Kellis, Marinka Zitnik

Spatial perturbation transcriptomics measures how genetic or chemical edits alter gene expression while preserving tissue context. Perturbation outcomes depend on a cell’s intrinsic state and also on how effects propagate across cellular microenvironments. We present CONCERT, a niche-aware generative model that embeds perturbation context and learns spatial kernels with a Gaussian process variational autoencoder to predict perturbation effects across tissue. We formalize three tasks: patch, border, and niche, predicting responses in nearby unperturbed regions, at tissue interfaces, and as a function of surrounding microenvironments. We evaluate CONCERT on Perturb-map lung datasets. CONCERT outperforms state-of-the-art models (dissociated counterfactuals, spatialized perturbation models, and kNN), reducing E-distance by up to 33.77% (patch), 26.05% (border), and 33.74% (niche) versus the next best, with mean absolute error down by up to 23.28% and Pearson correlation up by up to 9.10%. Two case studies go beyond benchmarking. In dextran sodium sulfate-induced colitis, CONCERT reconstructs spatial gene expression at unmeasured time points, produces longitudinal comparisons across unpaired mice, resolves inter-mouse heterogeneity, and recovers consistent temporal declines of inflammation-associated genes across regions. In ischemic stroke, CONCERT predicts responses under variable lesion sizes and in a 3D formulation across brain sections, capturing lesion-core and peri-lesion patterns. CONCERT performs niche-aware counterfactual prediction, reconstructs missing spatial data, and models perturbation responses across tissues.

38 -- A comprehensive catalog of B cell gene expression programs reveals new disease-associated functional states

Dylan Kotliar, Michelle Curtis, Soumya Raychaudhuri

Therapeutic B cell depletion is effective across multiple autoimmune diseases, yet the specific pathogenic B cell states responsible for durable clinical benefit remain poorly defined. Progress toward selective targeting of these populations requires scalable, reproducible approaches for annotating B cell states across heterogeneous single-cell datasets.

We previously developed star-CellAnnoTator (starCAT), a framework for identifying reproducible gene expression programs (GEPs) across diverse single-cell RNA-seq datasets. starCAT derives consensus GEPs (cGEPs) that can be applied as fixed annotation features across studies, enabling robust cell state inference despite technical variation and data sparsity. Here, we extend starCAT to the B cell lineage, analyzing approximately 1.25 million B lineage cells from over 2,000 samples spanning 49 tissues and disease contexts, including health, cancer, COVID-19, and rheumatoid arthritis. Cross-dataset integration yielded 43 B cell cGEPs, including programs associated with canonical lineage states as well as distinct cellular activity states.

We next demonstrate how these cGEPs can be used to interpret perturbation responses by applying them to experimentally BCR-stimulated B cells. This analysis reveals reproducible activation-associated programs that generalize across datasets, including a metabolic activation signature shared with previously described T cell activation states and a chemokine-associated inflammatory response enriched in viral infection contexts. These results illustrate how starCAT enables consistent identification of activated B cells without reliance on dataset-specific clustering or marker selection.

Finally, we show that the use of predefined cGEPs mitigates data sparsity and improves interpretability in spatial transcriptomics data. Applying the B cell cGEP framework to Xenium data from breast adenocarcinoma improves cell state annotation and reveals spatial associations between activated immune populations. Together, this work establishes a generalizable, data-efficient framework for B cell state annotation that can be readily applied across single-cell and spatial genomic datasets, facilitating systematic study of B cell heterogeneity in health and disease.

39 -- Predicting Activation Domain Variant Effects Using Deep Learning Models

Pooja Agarwal, Sanjana Kotha, Max Staller

Protein variants of uncertain significance (VUS) have been identified in autism patients in activation domains of transcription factors. Activation domains lie in intrinsically disordered regions, meaning they lack a well defined structure and exist in multiple conformers. This constrains our ability to predict activation domains from sequence and understand how mutants impact their activity; existing structure based models fail to predict the impact of mutations. Because experimental techniques can be lengthy, four deep learning models have emerged as alternative predictors. I investigated these predictors, using interpretability techniques like SHAP and DeepSHAP to find that the effects of aromatic, acidic, and positive residues on predicted gene activation depends on their location within the sequence. In addition, I found that the models lacked a comprehensive ability to predict the effects of point mutations in samples similar to their training set and in experimental data with human activation domains. Overall, the models were less transferable to predicting human activation domains, as they had been trained on plant and yeast data. A small number of VUS from ASD patients were predicted to have statistically significant impacts on activity; however, these did not always correlate with annotated ClinVar pathogenicity scores. Existing variant effect predictors like EVE and AlphaMissense also failed to correctly identify the effect of mutations in activation domains, as they are recognized to be less effective in disordered regions. We intend to test variant effect predictors designed for intrinsically disordered regions on existing experimental data to benchmark their abilities in activation domains. This work helps understand how current models function, shows that new activity predictors are necessary for understanding point mutations, and identifies variants that could be linked to ASD, improving diagnosis for patients with novel mutations.

40 -- no poster

41 -- Mathematical Model-Driven Deep Learning Enables Personalized Adaptive Therapy

Kit Gallagher

Standard-of-care treatment regimens have long been designed for maximal cell killing, yet these strategies often fail when applied to metastatic cancers due to the emergence of drug resistance. Adaptive treatment strategies have been developed as an alternative approach, dynamically adjusting treatment to suppress the growth of treatment-resistant populations and thereby delay, or even prevent, tumor progression. Promising clinical results in prostate cancer indicate the potential to optimize adaptive treatment protocols. Here, we applied deep reinforcement learning (DRL) to guide adaptive drug scheduling and demonstrated that these treatment schedules can outperform the current adaptive protocols in a mathematical model calibrated to prostate cancer dynamics, more than doubling the time to progression. The DRL strategies were robust to patient variability, including both tumor dynamics and clinical monitoring schedules. The DRL framework could produce interpretable, adaptive strategies based on a single tumor burden threshold, replicating and informing optimal treatment strategies. The DRL framework had no knowledge of the underlying mathematical tumor model, demonstrating the capability of DRL to help develop treatment strategies in novel or complex settings. Finally, a proposed five-step pathway, which combined mechanistic modeling with the DRL framework and integrated conventional tools to improve interpretability compared with traditional “black-box” DRL models, could allow translation of this approach to the clinic. Overall, the proposed framework generated personalized treatment schedules that consistently outperformed clinical standard-of-care protocols.

42 -- Evaluating Single-Cell Perturbation Response Models Is Far from Straightforward

Mahshid Heidari, Mina Karimpour, Sumana Srivatsa, Hesam Montazeri

Accurately predicting cellular responses to perturbations at single-cell resolution is a central challenge in modern biology and a critical step toward developing in silico virtual cells. Although various computational models have been proposed, their evaluation remains poorly standardized and often misleading, leaving the true capabilities and limitations of existing methods unclear. This study makes two primary contributions: (i) a systematic evaluation of commonly used performance metrics for single-cell perturbation prediction, and (ii) a rigorous benchmarking of computational models.

First, we examined commonly used average-based and distribution-based evaluation metrics using real perturbation datasets, simulated data, and controlled noise experiments. We showed that average-based metrics are strongly influenced by gene expression scale and sparsity and completely ignore gene–gene interactions. We further demonstrated that the Wasserstein distance, despite its central role in optimal transport–based models, can yield misleading divergence estimates in high-dimensional gene expression spaces. Furthermore, we found that Energy distance may overlook disruptions in gene–gene interactions in certain contexts. Lastly, we showed that a subset of genes, which we term trivial genes, can artificially inflate apparent model performance in differential gene expression analyses. We proposed Local Energy distance and a clustering-based Mixing Index, which quantifies the co-clustering of predicted and observed perturbed cells.

Second, we benchmarked two representative deep-learning models (CPA and scPRAM), a conditional autoencoder, multiple baseline strategies, and an idealized reference model defining empirical performance bounds. Across both out-of-distribution and partially in-distribution settings, we found that current deep-learning models exhibit limited generalization capacity and consistently fail to reproduce the full distribution of perturbed cell states, even when partially exposed to target perturbed cells during training.

Together, our results reveal fundamental shortcomings in both existing evaluation practices and current modeling approaches and provide practical guidelines and tools for more reliable benchmarking of single-cell perturbation response models.

‍

Biomedical Science and AI Symposium

Overview

Invited Speakers

Program

Day 1: Tuesday, April 28

Day 2: Wednesday, April 29

Poster Abstracts

Gallery

Register

Venue

Get Involved

Biomedical Science and AI Symposium

Overview

Invited Speakers

Program

Day 1: Tuesday, April 28

Day 2: Wednesday, April 29

Poster Abstracts

Day 1 -- Tuesday, April 28

Day 2 -- Wednesday, April 29

Gallery

Register

Venue

Get Involved

Connect with us on social media:

Contact Us