22 -- Cross-modal transfer learning for mapping bulk cancer phenotypes at single-cell resolution
Aarthi Venkat, Daniel Marbach, Zhiwen Jiang, Nir Hacohen, Marinka Zitnik
In cancer, single-cell transcriptomics has revealed deep insights into the tumor microenvironment, capturing gene programs, cell states, and multicellular hubs that underlie tumor initiation, progression, and treatment resistance. Despite their potential to yield clinically actionable biomarkers, single-cell technologies are not typically used for clinical decision-making due to financial, conceptual, and computational constraints. Instead, bulk transcriptomic technologies, which measure the average expression across entire samples rather than individual cells, are a more routine component of cancer care. However, approaches to derive patient endotypes and predict treatment response from bulk data alone are fundamentally limited in their ability to characterize cell-type-specific changes and often exhibit poor generalizability across diseases and cohorts due to the reliance on predefined sets of biomarkers.
Here, we present a parameter-efficient transfer learning approach to flexibly identify cellular and molecular factors underlying bulk clinical phenotypes. We propose an ontology-aware contrastive learning objective that first models cell type proportions from single-cell data as signals on the Cell Ontology graph, then aligns single-cell and pseudobulked samples to preserve pairwise similarities defined by graph-based optimal transport. This enables learning bulk representations that capture cell type-level differences. We further propose a cohort-aware hard negative sampling strategy to improve cross-cohort generalization. Trained with 1,631 single-cell tumor samples from the Curated Cancer Cell Atlas (3CA), our approach improves preservation of cell type proportion similarity, achieving 19.5% increase in within-cohort prediction and 40.7% increase in cross-cohort prediction over existing methods across five independent cohorts. We additionally fine-tune our model on 330 bulk melanoma samples using a large-scale single-cell melanoma atlas, enabling prediction of immunotherapy response, overall survival, and associated cell populations enriched in bulk samples. Together, our method learns patient representations that capture single-cell-level differences between bulk samples, providing an interpretable framework for population selection and patient matching across cohorts and indications.
23 -- Towards Generalist Agents for Accelerating Biomedical Discovery
Yuanqi Du, Botao Yu, Tianyu Liu, Tony Shen, Junwu Chen, Jan G. Rittig, Kunyang Sun, Yikun Zhang, Zhangde Song, Bo Zhou, Cassandra Masschelein, Yingze Wang, Haorui Wang, Haojun Jia, Chao Zhang, Hongyu Zhao, Martin Ester, Teresa Head-Gordon, Carla P. Gomes, Huan Sun, Chenru Duan, Philippe Schwaller, Wengong Jin
Large language models (LLMs) are opening new frontiers in scientific research, enabling capabilities ranging from literature retrieval and hypothesis generation to experiment planning and operation. In this talk, I will first present quantitative evidence that LLMs encode substantial scientific knowledge and that appropriate sampling/search strategies can reliably extract it. I will then focus on a key missing ingredient for applying LLMs to discovery in the natural sciences: automating objective-function design. To address this, I will introduce the Scientific Autonomous Goal-Evolving Agent (SAGA), which analyzes optimization outcomes, proposes improved objectives, and translates them into computable scoring functions with end-to-end validation. I will demonstrate SAGA across diverse discovery settings, including antibiotic design, inorganic materials design, functional DNA sequence design, and chemical process design. Finally, I will summarize lessons learned and discuss implications for the next generation of generalist scientific agents.
24 -- Biosynthesis of non-standard amino acids with AI
Anush Chiappino-Pepe*, Peter G Mikhael, Itamar Chinn, Bogdan Budnik, Alexander Pauer, Taylor Lanosky, Jenny M. Tam, Gregory Stephanopoulos, Regina Barzilay, George M. Church
Molecules with new functional groups that expand the amino acid alphabet could enable new forms of molecular recognition, cellular engineering, information storage, and catalysis, yet their availability has been limited to lengthy and costly chemical synthesis. Here we have discovered enzymes for the de novo biosynthesis of new non-standard amino acids. Using a machine-learning-guided search across plant proteomes coupled with ultra-sensitive metabolomics, we identified and characterized non-standard amino acid synthases that catalyze an unprecedented biochemistry for non-standard amino acid biosynthesis. These non-standard amino acid synthases define a new subclass of enzymes that expands the canonical scope of amino acid biosynthesis. Phylogenomic analyses indicate that these enzymes are widespread across angiosperms, revealing a previously unrecognized, widespread connection in amino acid metabolism. By uncovering nature’s capacity to expand the amino acid alphabet of life, this work opens new avenues for programmable, molecular architectures and expanded biological codes.
25 -- Multimodal Characterization of Sleep as a Phenotype in Neurodevelopmental Conditions
Sanketh Vedula, Andrew Kim, Manoj Kumar, Olga Troyanskaya, Guillermo Sapiro
Sleep is an important and quantifiable phenotype that reflects underlying neurophysiological processes as well as behavioral and health related factors. A growing body of work has reported associations between neurodevelopmental and neurodegenerative conditions and alterations in sleep timing, continuity, and variability, suggesting that sleep may provide a useful lens for studying these conditions. However, much of the existing literature relies on coarse or cross sectional measurements, limiting the ability to capture heterogeneity across individuals and nights and motivating the use of longitudinal, multimodal data to study sleep more systematically.
In this study, we examine sleep patterns using data from the Simons Sleep Project, a home based autism cohort (N=102; control=98) that combines multi night objective sleep measurements with behavioral questionnaires and genetic information. The dataset includes physiological recordings from wearable devices such as EEG enabled headbands, wrist worn sensors, and under mattress sleep monitors, alongside daily sleep diaries, standardized behavioral assessments, and whole exome sequencing. This multimodal structure enables joint analysis of sleep timing, continuity, variability, and neurophysiological proxies in real world settings, and supports investigation of how these features relate to behavioral phenotypes and genetic background.
We use multivariate statistical analysis frameworks to compare sleep features across individuals with Autism Spectrum Disorder (ASD), Attention Deficit Hyperactivity Disorder (ADHD), and neurotypical controls. The analysis focuses on distributional differences, intra individual variability, and cross modal relationships across physiological and behavioral measurements rather than diagnostic classification alone. In addition to conventional sleep summary metrics, we incorporate learned feature representations derived from foundation models that capture structure directly from physiological signals and reduce dependence on explicit sleep staging assumptions. In this poster, we will present findings from our ongoing study and discuss how multimodal analysis can support conservative and interpretable characterization of sleep heterogeneity, as well as generate hypotheses regarding biological mechanisms that may contribute to observed sleep differences.
26 -- no poster
27 -- Tell me who you walk with and I’ll tell you who you are: Bayesian network analysis reveals inflammation-dependent fungal interaction networks in ulcerative colitis from an underrepresented Latin American Cohort
Tamara Pérez-Jeldres, Ivania Valdés, Gabriel Ascui, Roberto Segovia, Cristian Hernández-Rocha, Carolina Pavez, Elisa Hernández, Verónica Silva, Elizabeth Arriagada, Lorena Azócar, Manuel Álvarez-Lobos, Erick Riquelme, Archana Sharma-Oates
IBD microbiome research has largely focused on bacterial dysbiosis in European populations, leaving the fungal component and population-specific interactions underexplored. We performed a comprehensive mycobiome and host integration analysis in a Chilean ulcerative colitis (UC) cohort including active UC, inactive UC, and non-IBD controls. Fungal community composition was profiled using ITS sequencing and analyzed using centered log-ratio–transformed abundances. Differentially abundant fungal species were identified across clinical comparisons using complementary statistical approaches, including LEfSe (LDA) and MaAsLin2. Species consistently detected as differentially abundant were used as candidate features for downstream analyses. To evaluate the discriminatory capacity of these fungal signatures, we constructed supervised classification models to distinguish clinical states using Random Forest and XGBoost. Model performance was assessed using standard classification metrics, and SHAP values. Fungal species that consistently contributed to accurate classification across comparisons were retained as robust, disease-associated features. These selected fungal species were subsequently used to infer state-specific Bayesian networks for active UC, inactive UC, and control groups. Network structure learning was performed using bootstrap resampling to identify stable conditional dependencies among fungal species within each clinical state. This approach revealed inflammation-associated reorganization of fungal interaction networks, including changes in network connectivity, the number of connected components, and shifts in hub species that were not explained by abundance changes alone. In a subset of patients with available bulk RNA-sequencing data, Bayesian network–derived fungal species were integrated with host transcriptomic profiles. Variance-stabilized gene expression data were correlated with fungal abundances using within-state Spearman analyses, identifying coordinated fungi–host associations linked to immune and inflammatory pathways. Our results shows that UC is associated with inflammation-dependent restructuring of fungal interaction networks and highlight the value of combining differential abundance testing, machine learning, and Bayesian network inference to characterize population-specific host–mycobiota interactions in underrepresented IBD cohorts.
28 -- CRIMP efficiently quantifies sub-cellular gene expression patterns from image-based spatial transcriptomics
Claire Wu, Hannah M. Schlüter, Uthsav Chitra, Caroline Uhler
Imaging-based spatial transcriptomics (iST) technologies measure mRNA transcripts at sub-cellular resolution, offering insight into how gene expression is distributed within individual cells. However, existing approaches for quantifying sub-cellular expression patterns either do not account for the high sparsity of iST data, or are computationally expensive and rely on geometric transformations that are often inaccurate for irregularly shaped cells. We introduce CRIMP (for Continuous Radial Intra-cellular Molecular Patterns), a highly efficient computer vision-based algorithm for quantifying radial, sub-cellular gene expression patterns. CRIMP models gene expression as a function of the “effective radius”—a 1-D coordinate measuring the distance from cell and nuclear boundaries that is computed using signed distance functions and is applicable for cells of any shape. CRIMP uses the effective radius to robustly quantify sub-cellular expression patterns for individual genes as well as identify groups of genes (modules) with shared sub-cellular expression patterns. We demonstrate that CRIMP accurately identifies sub-cellular expression patterns on data simulated at realistically sparse transcript densities. We further derive a statistical test for differential localization, a sub-cellular analogue of differential expression, measuring whether a gene has different sub-cellular localization patterns in two different groups of cells. On iST data from fibroblasts in culture and mouse brain tissue, CRIMP identifies novel functional gene modules and genes with differential localization across cell types.
29 -- BRIDGE: Benchmarking Large Language Models for Understanding Global Real-world Clinical Practice Text
Jiageng Wu, Bowen Gu, Ren Zhou, Kevin Xie, Doug Snyder, Yixing Jiang, Valentina Carducci, Richard Wyss, Rishi J Desai, Emily Alsentzer, Leo Anthony Celi, Adam Rodman, Sebastian Schneeweiss, Jonathan H. Chen, Santiago Romero-Brufau, Kueiyu Joshua Lin, Jie Yang
Background: Large language models (LLMs) show promise in clinical applications, yet current benchmarks mostly rely on medical exam-style or PubMed-derived text, failing to reflect the complexity of real-world clinical contexts.
Objective: To introduce BRIDGE, a multilingual benchmark for real-world clinical text understanding, and to systematically characterize performance across broad advanced LLMs, prompting strategies, languages, task types, and clinical specialties.
Methods: We curated publicly accessible clinical practice text from multiple global sources and constructed 87 tasks spanning 9 languages, 8 task types, 14 clinical specialties, and multiple stages of patient care. Tasks were standardized with consistent instructions, input–output formats, and evaluation protocols, and organized with a structured taxonomy by task nature and clinical scenario. We evaluated 95 state-of-the-art LLMs, including proprietary models (e.g., GPT-4o, Gemini), open-source general models (e.g., DeepSeek-R1, Qwen3 series), and medical-domain LLMs (e.g., MedGemma, Me-LLaMA). For each model, we tested three inference strategies: (1) zero-shot QA, (2) chain-of-thought (CoT) with explicit reasoning plus final answer, and (3) few-shot QA using five randomly sampled completed exemplars as in-context demonstrations. We further built a public-facing leaderboard enabling multi-level analyses across model families, strategies, languages, task types, and specialties.
Results: Performance varied substantially across languages, task types, and specialties. Under zero-shot prompting, the best overall models achieved scores around the mid-40s, with Gemini-2.5-Flash (44.8), DeepSeek-R1 (44.2), and GPT-4o (44.2) leading. Scaling trends largely held within model families (e.g., Llama, Qwen, Me-LLaMA). Few-shot prompting improved 91/95 models (95.8%), with Gemini-1.5-Pro rising from 43.8 to 55.5. In contrast, CoT—while improving interpretability—could reduce performance.
Conclusions: BRIDGE provides a comprehensive, multilingual evaluation of LLMs on real-world clinical text, highlighting persistent gaps and high variability. Few-shot prompting offers a strong, practical boost, whereas CoT may be unreliable, motivating further work on effectiveness, stability, and safer deployment in complex clinical settings.
30 -- Sensitive detection of rare immune cells in checkpoint inhibitor–induced myocarditis with scBoost
Fatima Zahra El Hajji, Salim El Mejjad, Salwa Enezari, Anas Bedraoui, Rachid El Fatimy, Tariq Daouda
Immune checkpoint inhibitors (ICIs) have transformed cancer therapy but can trigger severe immune-related toxicities. Among these, myocarditis is a life-threatening complication with high fatality rates. The cellular mechanisms underlying ICI-associated myocarditis (irMyocarditis) remain poorly defined, limiting early detection and risk stratification. Resolving this pathology requires high-resolution profiling of immune cell states, including rare pathogenic populations that may disproportionately drive tissue injury. Single-cell RNA sequencing (scRNA-seq) provides such resolution, yet integration of heterogeneous datasets often obscures rare but biologically important signals. Here, we introduce scBoost, a deep-learning framework designed to enhance rare cell type discovery in integrated scRNA-seq datasets. scBoost preserves critical biological signals through adaptive activation layers, enabling robust detection of sparse disease-relevant populations across complex, multi-batch datasets. We applied scBoost to a large irMyocarditis scRNA-seq cohort, including peripheral blood mononuclear cells (PBMCs; 366,066 cells) and cardiac-infiltrating immune cells (84,576 cells). While irMyocarditis is characterized by T-cell infiltration into the myocardium, scBoost identified a distinct population of proliferative cytotoxic CD8 T cells in peripheral blood prior to ICI therapy in patients who later developed myocarditis. These cells shared T-cell receptor clones with cycling CD8 T cells observed in the heart during disease, indicating systemic clonal expansion and targeted cardiac infiltration. Our findings reveal a rare circulating immune signature that precedes clinical onset of irMyocarditis and may serve as a predictive biomarker for immune-related cardiac toxicity. Beyond this, scBoost provides a generalizable framework for integrating scRNA-seq datasets to uncover clinically relevant rare cell populations, bridging computational innovation with translational immunology.
31 -- From Evolutionary Likelihood to Task-Specific Fitness: Few-Shot Adaptation for Efficient Protein Optimization
Zhilei Bei
Predicting the functional effects of protein mutations is a central challenge in protein engineering, yet experimental fitness measurements are expensive and scarce. This setting naturally gives rise to a few-shot task-conditioned inference problem, where each protein assay defines a task and a small support set of mutation–fitness pairs is used to infer the underlying fitness landscape. Existing approaches largely fall into two categories. Pretrained protein language models provide zero-shot mutation effect predictions based on evolutionary likelihoods, but are not adapted to task-specific fitness landscapes. In contrast, supervised fitness predictors can achieve higher accuracy, but require substantial labeled data and generalize poorly across proteins or assays. Together, these limitations motivate models that can infer task-aware protein-specific fitness landscapes from limited experimental data, enabling efficient protein optimization. We introduce a few-shot adaptation framework that aligns pretrained model likelihoods with task-specific fitness landscapes through preference learning, preserving the original model structure without requiring additional regression heads. To address data scarcity, we employ pairwise ranking objectives that transform N labeled examples into O(N²) preference pairs, substantially increasing the amount of effective supervision signals, and combine this with meta-transfer learning across multiple assays. We evaluate our method on deep mutational scanning datasets from ProteinGym, comparing against zero-shot scoring and supervised regression baselines. With as few as 64 labeled mutants per task, our approach achieves 50% relative improvement in Spearman correlation over zero-shot baselines and matches or exceeds supervised models trained with 8× more labeled data. Overall, this work enables efficient and generalizable protein fitness modeling in low-data regimes, addressing a key bottleneck in practical protein engineering.
32 -- High-Resolution Regulatory Mapping for Sequence-to-Function Modeling in Crop Plants
Evan Groover, David Ding, Gonzalo Benegas, Flora Wang, Stephen Chen, Michael Moubarak, Krishna Niyogi, Yun Song, David Savage
Predicting how non-coding DNA sequence variation quantitatively alters gene expression remains a central challenge for crop breeding and improvement. While deep learning models have shown promise in modeling regulatory sequence to predict crop phenotypes, their development is constrained by limited availability of high-resolution datasets for training and refinement. To address this, we developed a novel massively parallel regulatory mapping approach in the globally important cereal crop Sorghum bicolor. We systematically assay tens of thousands of CRISPR-like cis-regulatory perturbations- including single-nucleotide substitutions, deletions, and insertions- across multiple genes, yielding dense sequence-function maps that link perturbations to quantitative expression outcomes. This dataset provides a benchmark resource for training, evaluating, and refining genomic language models of regulatory sequence function. We demonstrate its utility by refining a DNA sequence language model pretrained via unsupervised learning on plant genomic sequence. Our results show that this modeling framework is effective at predicting the effects of small, local perturbations, but substantially less accurate for higher-order rearrangements and insertions, delineating current limits of sequence-based generalization. Notably, we observe that regulatory effects are strongly locus-dependent, highlighting limitations of generalist models and motivating context-aware approaches for regulatory prediction. Together, this work establishes a scalable experimental framework for building, refining, and validating sequence-to-function models for predictive crop engineering.
33 -- no poster
34 -- Learning Universal Representations of Intermolecular Interactions with ATOMICA
Ada Fang, Michael Desgagné, Zaixi Zhang, Andrew Zhou, Joseph Loscalzo, Bradley L. Pentelute, Marinka Zitnik
Molecular interactions underlie nearly all biological processes, but most machine learning models treat molecules in isolation or specialize in a single type of interaction, such as protein-ligand or protein-protein binding. Here, we introduce ATOMICA, a geometric deep learning model that learns atomic-scale representations of intermolecular interfaces across five modalities, including proteins, small molecules, metal ions, lipids, and nucleic acids. ATOMICA is trained on 2,037,972 interaction complexes using self-supervised denoising and masking to generate embeddings of interaction interfaces at the levels of atoms, chemical blocks, and molecular interfaces. ATOMICA’s latent space is compositional and captures physicochemical features shared across molecular classes, enabling representations of new molecular interactions to be generated by algebraically combining embeddings of interaction interfaces. The representation quality of this space improves with increased data volume and modality diversity. As in pre-trained natural language models, this scaling law implies predictable gains in performance as structural datasets expand. We construct modality-specific interfaceome networks, termed ATOMICANETs, which connect proteins based on interaction similarity with ions, small molecules, nucleic acids, lipids, and proteins. By overlaying disease-associated proteins of 27 diseases onto ATOMICANETs, we find strong associations for asthma in lipid networks and myeloid leukemia in ion networks. We use ATOMICA to annotate the dark proteome—proteins lacking known function—by predicting 2,646 uncharacterized ligand-binding sites, including putative zinc finger motifs and transmembrane cytochrome subunits. We experimentally confirm heme binding for five ATOMICA predictions in the dark proteome. By modeling molecular interactions, ATOMICA opens new avenues for understanding and annotating molecular function at scale.
35 -- Quantifying inheritance of protein regulatory networks within yeast cells lineages via fused ordinary differential equation modeling
Shirley Mathur, Wenbin Wu, Taylor Kennedy, Orlando Argüello-Miranda, Kevin Z. Lin
A fundamental question of interest in cell biology is understanding how regulatory mechanisms are inherited during cell division. Time-lapse microscopy of dividing S. cerevisiae (yeast) offers new biological data to better understand this mechanism, but the computational tools to analyze such data are still underexplored. We propose a novel method called PRNODE (Protein Regulatory Networks via Ordinary Differential Equations) based on graphical lasso to 1) jointly estimate protein regulatory networks for a mother cell and one of its daughters and 2) quantify the inheritance of a regulatory network from the mother cell to its daughter. This is methodologically critical: because daughter cells are born during the time-lapse experiment, they have fewer longitudinal samples, risking a potentially more inaccurate estimation of their regulatory network. By jointly estimating the mother and daughter networks together (i.e., "fused" estimation) in a statistical regularization framework, we demonstrate that our method increases the accuracy of estimating the regulatory networks while not diminishing its ability to assess different degrees of inheritance.
We apply our method to 181 pairs of mother-daughter cells to understand the inheritance of protein regulatory networks. For this, we collected time-lapse microscopy data tracking the fluoresence intensity of 6 proteins (Cdc10, Stb3, CLB5, Whi5, Xbp1, and Tup1) simultaneously with a 12 minute sampling rate over a 9+ hour experiment. We observe certain regulatory mechanisms that are consistently inherited by daughter cells. Furthermore, we estimate the amount of inheritance, which depends on the yeast cluster. We validate our inheritance metric by applying PRNODE to mother cells that are synthetically paired with unrelated daughter cells. We demonstrate that our estimated amount of inheritance is often higher in the observed mother-daughter pairings when compared to these synthetic mother-daughter pairings.
36 -- no poster
37 -- CONCERT predicts niche-aware perturbation responses in spatial transcriptomics
Xiang Lin, Zhenglun Kong, Soumya Ghosh, Manolis Kellis, Marinka Zitnik
Spatial perturbation transcriptomics measures how genetic or chemical edits alter gene expression while preserving tissue context. Perturbation outcomes depend on a cell’s intrinsic state and also on how effects propagate across cellular microenvironments. We present CONCERT, a niche-aware generative model that embeds perturbation context and learns spatial kernels with a Gaussian process variational autoencoder to predict perturbation effects across tissue. We formalize three tasks: patch, border, and niche, predicting responses in nearby unperturbed regions, at tissue interfaces, and as a function of surrounding microenvironments. We evaluate CONCERT on Perturb-map lung datasets. CONCERT outperforms state-of-the-art models (dissociated counterfactuals, spatialized perturbation models, and kNN), reducing E-distance by up to 33.77% (patch), 26.05% (border), and 33.74% (niche) versus the next best, with mean absolute error down by up to 23.28% and Pearson correlation up by up to 9.10%. Two case studies go beyond benchmarking. In dextran sodium sulfate-induced colitis, CONCERT reconstructs spatial gene expression at unmeasured time points, produces longitudinal comparisons across unpaired mice, resolves inter-mouse heterogeneity, and recovers consistent temporal declines of inflammation-associated genes across regions. In ischemic stroke, CONCERT predicts responses under variable lesion sizes and in a 3D formulation across brain sections, capturing lesion-core and peri-lesion patterns. CONCERT performs niche-aware counterfactual prediction, reconstructs missing spatial data, and models perturbation responses across tissues.
38 -- A comprehensive catalog of B cell gene expression programs reveals new disease-associated functional states
Dylan Kotliar, Michelle Curtis, Soumya Raychaudhuri
Therapeutic B cell depletion is effective across multiple autoimmune diseases, yet the specific pathogenic B cell states responsible for durable clinical benefit remain poorly defined. Progress toward selective targeting of these populations requires scalable, reproducible approaches for annotating B cell states across heterogeneous single-cell datasets.
We previously developed star-CellAnnoTator (starCAT), a framework for identifying reproducible gene expression programs (GEPs) across diverse single-cell RNA-seq datasets. starCAT derives consensus GEPs (cGEPs) that can be applied as fixed annotation features across studies, enabling robust cell state inference despite technical variation and data sparsity. Here, we extend starCAT to the B cell lineage, analyzing approximately 1.25 million B lineage cells from over 2,000 samples spanning 49 tissues and disease contexts, including health, cancer, COVID-19, and rheumatoid arthritis. Cross-dataset integration yielded 43 B cell cGEPs, including programs associated with canonical lineage states as well as distinct cellular activity states.
We next demonstrate how these cGEPs can be used to interpret perturbation responses by applying them to experimentally BCR-stimulated B cells. This analysis reveals reproducible activation-associated programs that generalize across datasets, including a metabolic activation signature shared with previously described T cell activation states and a chemokine-associated inflammatory response enriched in viral infection contexts. These results illustrate how starCAT enables consistent identification of activated B cells without reliance on dataset-specific clustering or marker selection.
Finally, we show that the use of predefined cGEPs mitigates data sparsity and improves interpretability in spatial transcriptomics data. Applying the B cell cGEP framework to Xenium data from breast adenocarcinoma improves cell state annotation and reveals spatial associations between activated immune populations. Together, this work establishes a generalizable, data-efficient framework for B cell state annotation that can be readily applied across single-cell and spatial genomic datasets, facilitating systematic study of B cell heterogeneity in health and disease.
39 -- Predicting Activation Domain Variant Effects Using Deep Learning Models
Pooja Agarwal, Sanjana Kotha, Max Staller
Protein variants of uncertain significance (VUS) have been identified in autism patients in activation domains of transcription factors. Activation domains lie in intrinsically disordered regions, meaning they lack a well defined structure and exist in multiple conformers. This constrains our ability to predict activation domains from sequence and understand how mutants impact their activity; existing structure based models fail to predict the impact of mutations. Because experimental techniques can be lengthy, four deep learning models have emerged as alternative predictors. I investigated these predictors, using interpretability techniques like SHAP and DeepSHAP to find that the effects of aromatic, acidic, and positive residues on predicted gene activation depends on their location within the sequence. In addition, I found that the models lacked a comprehensive ability to predict the effects of point mutations in samples similar to their training set and in experimental data with human activation domains. Overall, the models were less transferable to predicting human activation domains, as they had been trained on plant and yeast data. A small number of VUS from ASD patients were predicted to have statistically significant impacts on activity; however, these did not always correlate with annotated ClinVar pathogenicity scores. Existing variant effect predictors like EVE and AlphaMissense also failed to correctly identify the effect of mutations in activation domains, as they are recognized to be less effective in disordered regions. We intend to test variant effect predictors designed for intrinsically disordered regions on existing experimental data to benchmark their abilities in activation domains. This work helps understand how current models function, shows that new activity predictors are necessary for understanding point mutations, and identifies variants that could be linked to ASD, improving diagnosis for patients with novel mutations.
40 -- no poster
41 -- Mathematical Model-Driven Deep Learning Enables Personalized Adaptive Therapy
Kit Gallagher
Standard-of-care treatment regimens have long been designed for maximal cell killing, yet these strategies often fail when applied to metastatic cancers due to the emergence of drug resistance. Adaptive treatment strategies have been developed as an alternative approach, dynamically adjusting treatment to suppress the growth of treatment-resistant populations and thereby delay, or even prevent, tumor progression. Promising clinical results in prostate cancer indicate the potential to optimize adaptive treatment protocols. Here, we applied deep reinforcement learning (DRL) to guide adaptive drug scheduling and demonstrated that these treatment schedules can outperform the current adaptive protocols in a mathematical model calibrated to prostate cancer dynamics, more than doubling the time to progression. The DRL strategies were robust to patient variability, including both tumor dynamics and clinical monitoring schedules. The DRL framework could produce interpretable, adaptive strategies based on a single tumor burden threshold, replicating and informing optimal treatment strategies. The DRL framework had no knowledge of the underlying mathematical tumor model, demonstrating the capability of DRL to help develop treatment strategies in novel or complex settings. Finally, a proposed five-step pathway, which combined mechanistic modeling with the DRL framework and integrated conventional tools to improve interpretability compared with traditional “black-box” DRL models, could allow translation of this approach to the clinic. Overall, the proposed framework generated personalized treatment schedules that consistently outperformed clinical standard-of-care protocols.
42 -- Evaluating Single-Cell Perturbation Response Models Is Far from Straightforward
Mahshid Heidari, Mina Karimpour, Sumana Srivatsa, Hesam Montazeri
Accurately predicting cellular responses to perturbations at single-cell resolution is a central challenge in modern biology and a critical step toward developing in silico virtual cells. Although various computational models have been proposed, their evaluation remains poorly standardized and often misleading, leaving the true capabilities and limitations of existing methods unclear. This study makes two primary contributions: (i) a systematic evaluation of commonly used performance metrics for single-cell perturbation prediction, and (ii) a rigorous benchmarking of computational models.
First, we examined commonly used average-based and distribution-based evaluation metrics using real perturbation datasets, simulated data, and controlled noise experiments. We showed that average-based metrics are strongly influenced by gene expression scale and sparsity and completely ignore gene–gene interactions. We further demonstrated that the Wasserstein distance, despite its central role in optimal transport–based models, can yield misleading divergence estimates in high-dimensional gene expression spaces. Furthermore, we found that Energy distance may overlook disruptions in gene–gene interactions in certain contexts. Lastly, we showed that a subset of genes, which we term trivial genes, can artificially inflate apparent model performance in differential gene expression analyses. We proposed Local Energy distance and a clustering-based Mixing Index, which quantifies the co-clustering of predicted and observed perturbed cells.
Second, we benchmarked two representative deep-learning models (CPA and scPRAM), a conditional autoencoder, multiple baseline strategies, and an idealized reference model defining empirical performance bounds. Across both out-of-distribution and partially in-distribution settings, we found that current deep-learning models exhibit limited generalization capacity and consistently fail to reproduce the full distribution of perturbed cell states, even when partially exposed to target perturbed cells during training.
Together, our results reveal fundamental shortcomings in both existing evaluation practices and current modeling approaches and provide practical guidelines and tools for more reliable benchmarking of single-cell perturbation response models.